diff mbox series

describe: refresh the index when 'broken' flag is used

Message ID 20240623214301.143796-1-abhijeet.nkt@gmail.com (mailing list archive)
State Superseded
Headers show
Series describe: refresh the index when 'broken' flag is used | expand

Commit Message

Abhijeet Sonar June 23, 2024, 9:42 p.m. UTC
When describe is run with 'dirty' flag, we refresh the index
to make sure it is in sync with the filesystem before
determining if the working tree is dirty.  However, this is
not done for the codepath where the 'broken' flag is used.

This causes `git describe --broken --dirty` to false
positively report the worktree being dirty.  Refreshing the
index before running diff-index fixes the problem.

Signed-off-by: Abhijeet Sonar <abhijeet.nkt@gmail.com>
Reported-by: Paul Millar <paul.millar@desy.de>
Suggested-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/describe.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

Comments

Abhijeet Sonar June 23, 2024, 9:56 p.m. UTC | #1
This is my first ever patch submission and I am excited to contribute
even if it is such a little thing! Please let me know of any etiquette
or convention violations, I will do my best to uphold them moving forward.

Thanks

On 24/06/24 03:12, Abhijeet Sonar wrote:
> When describe is run with 'dirty' flag, we refresh the index
> to make sure it is in sync with the filesystem before
> determining if the working tree is dirty.  However, this is
> not done for the codepath where the 'broken' flag is used.
>
> This causes `git describe --broken --dirty` to false
> positively report the worktree being dirty.  Refreshing the
> index before running diff-index fixes the problem.
>
> Signed-off-by: Abhijeet Sonar <abhijeet.nkt@gmail.com>
> Reported-by: Paul Millar <paul.millar@desy.de>
> Suggested-by: Junio C Hamano <gitster@pobox.com>
> ---
>   builtin/describe.c | 14 ++++++++++++++
>   1 file changed, 14 insertions(+)
>
> diff --git a/builtin/describe.c b/builtin/describe.c
> index e5287eddf2..2b443c155e 100644
> --- a/builtin/describe.c
> +++ b/builtin/describe.c
> @@ -645,6 +645,20 @@ int cmd_describe(int argc, const char **argv, const char *prefix)
>   	if (argc == 0) {
>   		if (broken) {
>   			struct child_process cp = CHILD_PROCESS_INIT;
> +			struct lock_file index_lock = LOCK_INIT;
> +			int fd;
> +
> +			setup_work_tree();
> +			prepare_repo_settings(the_repository);
> +			repo_read_index(the_repository);
> +			refresh_index(the_repository->index, REFRESH_QUIET|REFRESH_UNMERGED,
> +				      NULL, NULL, NULL);
> +			fd = repo_hold_locked_index(the_repository,
> +						    &index_lock, 0);
> +			if (0 <= fd)
> +				repo_update_index_if_able(the_repository, &index_lock);
> +
> +
>   			strvec_pushv(&cp.args, diff_index_args);
>   			cp.git_cmd = 1;
>   			cp.no_stdin = 1;
karthik nayak June 24, 2024, 10:56 a.m. UTC | #2
Abhijeet Sonar <abhijeet.nkt@gmail.com> writes:

> When describe is run with 'dirty' flag, we refresh the index
> to make sure it is in sync with the filesystem before
> determining if the working tree is dirty.  However, this is
> not done for the codepath where the 'broken' flag is used.
>
> This causes `git describe --broken --dirty` to false
> positively report the worktree being dirty.  Refreshing the
> index before running diff-index fixes the problem.
>
> Signed-off-by: Abhijeet Sonar <abhijeet.nkt@gmail.com>
> Reported-by: Paul Millar <paul.millar@desy.de>
> Suggested-by: Junio C Hamano <gitster@pobox.com>
> ---
>  builtin/describe.c | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
>
> diff --git a/builtin/describe.c b/builtin/describe.c
> index e5287eddf2..2b443c155e 100644
> --- a/builtin/describe.c
> +++ b/builtin/describe.c
> @@ -645,6 +645,20 @@ int cmd_describe(int argc, const char **argv, const char *prefix)
>  	if (argc == 0) {
>  		if (broken) {
>  			struct child_process cp = CHILD_PROCESS_INIT;
> +			struct lock_file index_lock = LOCK_INIT;
> +			int fd;
> +
> +			setup_work_tree();
> +			prepare_repo_settings(the_repository);
> +			repo_read_index(the_repository);
> +			refresh_index(the_repository->index, REFRESH_QUIET|REFRESH_UNMERGED,
> +				      NULL, NULL, NULL);
> +			fd = repo_hold_locked_index(the_repository,
> +						    &index_lock, 0);
> +			if (0 <= fd)
> +				repo_update_index_if_able(the_repository, &index_lock);
> +
> +
>

I'm wondering why this needs to be done, as I can see, when we use the
'--broken' flag, we create a child process to run `git diff-index
--quiet HEAD`. As such, we shouldn't have to refresh the index here.

Could you perhaps state how you can reproduce the issue mentioned?

Also apart from that, we should add a test to capture the changes.

>  			cp.git_cmd = 1;
>  			cp.no_stdin = 1;
> --
> 2.45.GIT
Phillip Wood June 24, 2024, 11:20 a.m. UTC | #3
Hi Abhijeet and Karthik

On 24/06/2024 11:56, Karthik Nayak wrote:
> Abhijeet Sonar <abhijeet.nkt@gmail.com> writes:
> 
>> When describe is run with 'dirty' flag, we refresh the index
>> to make sure it is in sync with the filesystem before
>> determining if the working tree is dirty.  However, this is
>> not done for the codepath where the 'broken' flag is used.
>>
>> This causes `git describe --broken --dirty` to false
>> positively report the worktree being dirty.  Refreshing the
>> index before running diff-index fixes the problem.

This is a good description of the problem the patch fixes.

>> Signed-off-by: Abhijeet Sonar <abhijeet.nkt@gmail.com>
>> Reported-by: Paul Millar <paul.millar@desy.de>
>> Suggested-by: Junio C Hamano <gitster@pobox.com>
>> ---
>>   builtin/describe.c | 14 ++++++++++++++
>>   1 file changed, 14 insertions(+)
>>
>> diff --git a/builtin/describe.c b/builtin/describe.c
>> index e5287eddf2..2b443c155e 100644
>> --- a/builtin/describe.c
>> +++ b/builtin/describe.c
>> @@ -645,6 +645,20 @@ int cmd_describe(int argc, const char **argv, const char *prefix)
>>   	if (argc == 0) {
>>   		if (broken) {
>>   			struct child_process cp = CHILD_PROCESS_INIT;
>> +			struct lock_file index_lock = LOCK_INIT;
>> +			int fd;
>> +
>> +			setup_work_tree();
>> +			prepare_repo_settings(the_repository);
>> +			repo_read_index(the_repository);
>> +			refresh_index(the_repository->index, REFRESH_QUIET|REFRESH_UNMERGED,
>> +				      NULL, NULL, NULL);
>> +			fd = repo_hold_locked_index(the_repository,
>> +						    &index_lock, 0);
>> +			if (0 <= fd)
>> +				repo_update_index_if_able(the_repository, &index_lock);
>> +

As we're dealing with a repository that might be broken I suspect we'd 
be better to run "git update-index --unmerged -q --refresh" as a 
subprocess in the same way that we run "git diff-index" so that "git 
describe --broken" does not die if the index cannot be refreshed.

> I'm wondering why this needs to be done, as I can see, when we use the
> '--broken' flag, we create a child process to run `git diff-index
> --quiet HEAD`. As such, we shouldn't have to refresh the index here.

"git diff-index" and "git diff-files" do not refresh the index. This is 
by design so that a script can refresh the index once and run "git 
diff-index" several times without wasting time updating the index each time.

> Also apart from that, we should add a test to capture the changes.

That would be nice

Best Wishes

Phillip


>>   			cp.git_cmd = 1;
>>   			cp.no_stdin = 1;
>> --
>> 2.45.GIT
Abhijeet Sonar June 24, 2024, 12:20 p.m. UTC | #4
On 24/06/24 16:50, Phillip Wood wrote:

> This is a good description of the problem the patch fixes.

Thanks!

> As we're dealing with a repository that might be broken I suspect we'd 
> be better to run "git update-index --unmerged -q --refresh" as a 
> subprocess in the same way that we run "git diff-index" so that "git 
> describe --broken" does not die if the index cannot be refreshed.

I see, that makes sense.  I will change it to launch `update-index` in a 
sub-process
instead.

>> Also apart from that, we should add a test to capture the changes.
> That would be nice
Got it, I will add some tests as well.

Thanks
Abhijeet Sonar June 24, 2024, 1:48 p.m. UTC | #5
I have a question:

I would like to change the owner of a file
in the test case I am writing -- an operation
that requires super-user privileges.  I am not
sure if it is okay to do that in tests.  Since
that would require running tests with `sudo`.

What would be the correct way to do this?
Junio C Hamano June 24, 2024, 4:34 p.m. UTC | #6
Abhijeet Sonar <abhijeet.nkt@gmail.com> writes:

> I would like to change the owner of a file
> in the test case I am writing -- an operation
> that requires super-user privileges.  I am not
> sure if it is okay to do that in tests.  Since
> that would require running tests with `sudo`.

What is the reason why you want to change the owner of a file in
your test?

If it is merely to make sure you cannot write to the .git/index
file, temporarily doing chmod of the .git directory in a test (with
POSIXPERM prerequisite) may be one way to do so, and you do not need
the second user in the system test is running.

Or if you pretend that you have a second process that is holding the
lock in .git/index by creating .git/index.lock file yourself, that
would also prevent your tested command from touching the index.

The latter approach would result in a test that may look like so (I
am writing this in my mail client, and I expect there may be some
fix ups needed):

test_expect_success 'see what --broken does upon unwritable index' '
        test_when_finished "rm -f .git/index.lock" &&
        test_commit A A.file &&
        echo changed >>A.file &&
        >.git/index.lock &&
	test_must_fail git describe --dirty >actual 2>error &&
	test_grep "could not write index" error &&
	git describe --broken --dirty >actual 2>error &&
	test_grep ! "could not write index" error &&
	echo ...expected.describe.result... >expect &&
	test_cmp expect actual
'

HTH.
Abhijeet Sonar June 24, 2024, 5:37 p.m. UTC | #7
> What is the reason why you want to change the owner of a file in
> your test?
>
> If it is merely to make sure you cannot write to the .git/index
> file, temporarily doing chmod of the .git directory in a test (with
POSIXPERM prerequisite) may be one way to do so, and you do not need
the second user in the system test is running.

I want to change the owner of a checked-in file and not the `.git`
directory.  This is because of what you noted in an earlier message:

> As many attributes of each file (like the
> last modified timestamp and who owns the file) are recorded in the
> index for files that were verified to be unmodified (this is done so
> that by doing lstat() on a path and comparing the result with the
> information saved in the index, we can notice that the path was
> modified without actually opening the file and looking at the
> contents), after doing something (like "git diff") that causes this
> information updated while the files appear to be owned by you

Currently, `git describe --dirty --broken` reports the working tree as
dirty if you change the owner of a file.  And as Phillip pointed out,
calling `git update-index --unmerged -q --refresh` to update the index
fixes this.

What I want to test looks something like this:

# initially, the file is owned by a non-root user
chown root file
git describe --dirty --broken  # incorrectly suffixes the output with '-dirty'

As mentioned earlier, the dirty suffix goes away if the index is
refreshed before running describe.  This is what I really want to
assert -- that there is no '-dirty' suffix when owner of a file is changed.
This kind of simulates the scenario where `git describe` is run in a
docker container as was originally reported by Paul:

> mkdir test-container
>
> cd test-container
> 
> cat >Dockerfile <<EOF
> 
> FROM docker.io/debian:bookworm-slim
> 
> WORKDIR /work
> 
> RUN apt-get update && apt-get -y install git
> 
> EOF
> 
> podman build -t test-image .
> 
> 
> 
> mkdir test-repo
> 
> cd test-repo
> 
> git init echo "Hello, world" > README
> 
> git add README
> 
> git commit -m "Initial commit" README
> 
> git tag v1.0.0
>
>
>
> git describe --tags --dirty --broken
>
>
>
> podman run -v `pwd`:/work --rm -it --entrypoint '["/usr/bin/git",
> 
>    "describe", "--tags", "--dirty", "--broken"]' test-image


Thanks
Junio C Hamano June 24, 2024, 5:58 p.m. UTC | #8
Abhijeet Sonar <abhijeet.nkt@gmail.com> writes:

> Currently, `git describe --dirty --broken` reports the working tree as
> dirty if you change the owner of a file.  And as Phillip pointed out,
> calling `git update-index --unmerged -q --refresh` to update the index
> fixes this.

Starting from a clean state with a tracked file COPYING, I can do
this:

    $ git describe --dirty --broken
    v2.45.2-862-g39ba10deb2
    $ cat COPYING >RENAMING && mv RENAMING COPYING
    $ git diff-index --abbrev=8 HEAD
    :100644 100644 536e5552 00000000 M	COPYING
    $ git describe --dirty --broken
    v2.45.2-862-g39ba10deb2-dirty
    $ git describe --dirty
    v2.45.2-862-g39ba10deb2

This is with a version if Git _without_ your fix, i.e. the one whose
"describe --broken --dirty" does not do "git update-index --refresh".

In other words, the stat-only change to cause "diff-index" to report
a "suspected to be modified" does not have to be that the file is
owned by a different owner.  So I still do not understand why you
want a second user in this test.
Abhijeet Sonar June 25, 2024, 6:40 a.m. UTC | #9
> So I still do not understand why you > want a second user in this 
test. What I really wanted to do was closely mirror the environment in 
reproduction steps mentioned in original bug report. Which I figured 
could be done by changing the owner to a second user.


On 24/06/24 23:28, Junio C Hamano wrote:

>      $ git describe --dirty --broken
>      v2.45.2-862-g39ba10deb2
>      $ cat COPYING >RENAMING && mv RENAMING COPYING
>      $ git diff-index --abbrev=8 HEAD
>      :100644 100644 536e5552 00000000 M	COPYING
>      $ git describe --dirty --broken
>      v2.45.2-862-g39ba10deb2-dirty
>      $ git describe --dirty
>      v2.45.2-862-g39ba10deb2

Thanks, I will use this in the tests.
karthik nayak June 25, 2024, 12:28 p.m. UTC | #10
Phillip Wood <phillip.wood123@gmail.com> writes:

> Hi Abhijeet and Karthik
>
> On 24/06/2024 11:56, Karthik Nayak wrote:
>> Abhijeet Sonar <abhijeet.nkt@gmail.com> writes:
>>
>>> When describe is run with 'dirty' flag, we refresh the index
>>> to make sure it is in sync with the filesystem before
>>> determining if the working tree is dirty.  However, this is
>>> not done for the codepath where the 'broken' flag is used.
>>>
>>> This causes `git describe --broken --dirty` to false
>>> positively report the worktree being dirty.  Refreshing the
>>> index before running diff-index fixes the problem.
>
> This is a good description of the problem the patch fixes.
>
>>> Signed-off-by: Abhijeet Sonar <abhijeet.nkt@gmail.com>
>>> Reported-by: Paul Millar <paul.millar@desy.de>
>>> Suggested-by: Junio C Hamano <gitster@pobox.com>
>>> ---
>>>   builtin/describe.c | 14 ++++++++++++++
>>>   1 file changed, 14 insertions(+)
>>>
>>> diff --git a/builtin/describe.c b/builtin/describe.c
>>> index e5287eddf2..2b443c155e 100644
>>> --- a/builtin/describe.c
>>> +++ b/builtin/describe.c
>>> @@ -645,6 +645,20 @@ int cmd_describe(int argc, const char **argv, const char *prefix)
>>>   	if (argc == 0) {
>>>   		if (broken) {
>>>   			struct child_process cp = CHILD_PROCESS_INIT;
>>> +			struct lock_file index_lock = LOCK_INIT;
>>> +			int fd;
>>> +
>>> +			setup_work_tree();
>>> +			prepare_repo_settings(the_repository);
>>> +			repo_read_index(the_repository);
>>> +			refresh_index(the_repository->index, REFRESH_QUIET|REFRESH_UNMERGED,
>>> +				      NULL, NULL, NULL);
>>> +			fd = repo_hold_locked_index(the_repository,
>>> +						    &index_lock, 0);
>>> +			if (0 <= fd)
>>> +				repo_update_index_if_able(the_repository, &index_lock);
>>> +
>
> As we're dealing with a repository that might be broken I suspect we'd
> be better to run "git update-index --unmerged -q --refresh" as a
> subprocess in the same way that we run "git diff-index" so that "git
> describe --broken" does not die if the index cannot be refreshed.
>
>> I'm wondering why this needs to be done, as I can see, when we use the
>> '--broken' flag, we create a child process to run `git diff-index
>> --quiet HEAD`. As such, we shouldn't have to refresh the index here.
>
> "git diff-index" and "git diff-files" do not refresh the index. This is
> by design so that a script can refresh the index once and run "git
> diff-index" several times without wasting time updating the index each time.
>

I see. Thanks for correcting me!

>> Also apart from that, we should add a test to capture the changes.
>
> That would be nice
>
> Best Wishes
>
> Phillip
>
>
>>>   			cp.git_cmd = 1;
>>>   			cp.no_stdin = 1;
>>> --
>>> 2.45.GIT
diff mbox series

Patch

diff --git a/builtin/describe.c b/builtin/describe.c
index e5287eddf2..2b443c155e 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -645,6 +645,20 @@  int cmd_describe(int argc, const char **argv, const char *prefix)
 	if (argc == 0) {
 		if (broken) {
 			struct child_process cp = CHILD_PROCESS_INIT;
+			struct lock_file index_lock = LOCK_INIT;
+			int fd;
+
+			setup_work_tree();
+			prepare_repo_settings(the_repository);
+			repo_read_index(the_repository);
+			refresh_index(the_repository->index, REFRESH_QUIET|REFRESH_UNMERGED,
+				      NULL, NULL, NULL);
+			fd = repo_hold_locked_index(the_repository,
+						    &index_lock, 0);
+			if (0 <= fd)
+				repo_update_index_if_able(the_repository, &index_lock);
+
+
 			strvec_pushv(&cp.args, diff_index_args);
 			cp.git_cmd = 1;
 			cp.no_stdin = 1;