@@ -129,6 +129,17 @@ marks the same across runs.
for intermediary filters (e.g. for rewriting commit messages
which refer to older commits, or for stripping blobs by id).
+--always-show-modify-after-rename::
+ When a rename is detected, fast-export normally issues both a
+ 'R' (rename) and a 'M' (modify) directive. However, if the
+ contents of the old and new filename match exactly, it will
+ only issue the rename directive. Use this flag to have it
+ always issue the modify directive after the rename, which may
+ be useful for tools which are using the fast-export stream as
+ a mechanism for gathering statistics about a repository. Note
+ that this option only has effect when rename detection is
+ active (see the -M option).
+
--refspec::
Apply the specified refspec to each ref exported. Multiple of them can
be specified.
@@ -38,6 +38,7 @@ static int use_done_feature;
static int no_data;
static int full_tree;
static int reference_excluded_commits;
+static int always_show_modify_after_rename;
static int show_original_ids;
static struct string_list extra_refs = STRING_LIST_INIT_NODUP;
static struct string_list tag_refs = STRING_LIST_INIT_NODUP;
@@ -407,7 +408,8 @@ static void show_filemodify(struct diff_queue_struct *q,
putchar('\n');
if (oideq(&ospec->oid, &spec->oid) &&
- ospec->mode == spec->mode)
+ ospec->mode == spec->mode &&
+ !always_show_modify_after_rename)
break;
}
/* fallthrough */
@@ -1105,6 +1107,9 @@ int cmd_fast_export(int argc, const char **argv, const char *prefix)
&reference_excluded_commits, N_("Reference parents which are not in fast-export stream by sha1sum")),
OPT_BOOL(0, "show-original-ids", &show_original_ids,
N_("Show original sha1sums of blobs/commits")),
+ OPT_BOOL(0, "always-show-modify-after-rename",
+ &always_show_modify_after_rename,
+ N_("Always provide 'M' directive after 'R'")),
OPT_END()
};
@@ -630,4 +630,40 @@ test_expect_success 'merge commit gets exported with --import-marks' '
)
'
+test_expect_success 'rename detection and --always-show-modify-after-rename' '
+ test_create_repo renames &&
+ (
+ cd renames &&
+ test_seq 0 9 >single_digit &&
+ test_seq 10 98 >double_digit &&
+ git add . &&
+ git commit -m initial &&
+
+ echo 99 >>double_digit &&
+ git mv single_digit single-digit &&
+ git mv double_digit double-digit &&
+ git add double-digit &&
+ git commit -m renames &&
+
+ # First, check normal fast-export -M output
+ git fast-export -M --no-data master >out &&
+
+ grep double-digit out >out2 &&
+ test_line_count = 2 out2 &&
+
+ grep single-digit out >out2 &&
+ test_line_count = 1 out2 &&
+
+ # Now, test with --always-show-modify-after-rename; should
+ # have an extra "M" directive for "single-digit".
+ git fast-export -M --no-data --always-show-modify-after-rename master >out &&
+
+ grep double-digit out >out2 &&
+ test_line_count = 2 out2 &&
+
+ grep single-digit out >out2 &&
+ test_line_count = 2 out2
+ )
+'
+
test_done
I wanted a way to gather all the following information efficiently (with as few history traversals as possible): * Get all blob sizes * Map blob shas to filename(s) they appeared under in the history * Find when files and directories were deleted (and whether they were later reinstated, since that means they aren't actually gone) * Find sets of filenames referring to the same logical 'file'. (e.g. foo->bar in commit A and bar->baz in commit B mean that {foo,bar,baz} refer to the same 'file', so someone wanting to just "keep baz and its history" need all versions of those three filenames). I need to know about things like another foo or bar being introduced after the rename though, since that breaks the connection between filenames) and then I would generate various aggregations on the data and display some type of report for the user. The only way I know of to get blob sizes is via cat-file --batch-all-objects --batch-check The rest of the data would traditionally be gathered from a log command, e.g. git log --format='%H%n%P%n%cd' --date=short --topo-order --reverse \ -M --diff-filter=RAMD --no-abbrev --raw -c however, parsing log output seems slightly dangerous given that it is a porcelain command. While we have specified --format and --raw to try to avoid the most obvious problems, I'm still slightly concerned about --date=short, the combinations of --raw and -c, options that might colorize the output, and also the --diff-filter (there is no current option named --no-find-copies or --no-break-rewrites, but what if those turn on by default in the future much as we changed the default with detecting renames?). Each of those is a small worry, but they add up. A command meant for data serialization, such as fast-export, seems like a better candidate for this job. There's just one missing item: in order to connect blob sizes to filenames, I need fast-export to tell me the blob sha1sum of any file changes. It does this for modifies, but not always for renames. In particular, if a file is a 100% rename, it only prints R oldname newname instead of R oldname newname M 100644 $SHA1 newname as occurs when there is a rename+modify. Add an option which allows us to force the latter output even when commits have exact renames of files. Signed-off-by: Elijah Newren <newren@gmail.com> --- Documentation/git-fast-export.txt | 11 ++++++++++ builtin/fast-export.c | 7 +++++- t/t9350-fast-export.sh | 36 +++++++++++++++++++++++++++++++ 3 files changed, 53 insertions(+), 1 deletion(-)