Message ID | 20181116110356.12311-9-phillip.wood@talktalk.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | diff --color-moved-ws fixes and enhancment | expand |
On Fri, Nov 16, 2018 at 3:04 AM Phillip Wood <phillip.wood@talktalk.net> wrote: > > From: Phillip Wood <phillip.wood@dunelm.org.uk> > > Currently diff --color-moved-ws=allow-indentation-change does not > support indentation that contains a mix of tabs and spaces. For > example in commit 546f70f377 ("convert.h: drop 'extern' from function > declaration", 2018-06-30) the function parameters in the following > lines are not colored as moved [1]. > > -extern int stream_filter(struct stream_filter *, > - const char *input, size_t *isize_p, > - char *output, size_t *osize_p); > +int stream_filter(struct stream_filter *, > + const char *input, size_t *isize_p, > + char *output, size_t *osize_p); > > This commit changes the way the indentation is handled to track the > visual size of the indentation rather than the characters in the > indentation. This has they benefit that any whitespace errors do not s/they/the/ > interfer with the move detection (the whitespace errors will still be > highlighted according to --ws-error-highlight). During the discussion > of this feature there were concerns about the correct detection of > indentation for python. However those concerns apply whether or not > we're detecting moved lines so no attempt is made to determine if the > indentation is 'pythonic'. > > [1] Note that before the commit to fix the erroneous coloring of moved > lines each line was colored as a different block, since that commit > they are uncolored. > > Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> > --- > > Notes: > Changes since rfc: > - It now replaces the existing implementation rather than adding a new > mode. > - The indentation deltas are now calculated once for each line and > cached. > - Optimized the whitespace delta comparison to compare string lengths > before comparing the actual strings. > - Modified the calculation of tabs as suggested by Stefan. > - Split out the blank line handling into a separate commit as suggest > by Stefan. > - Fixed some comments pointed out by Stefan. > > diff.c | 130 +++++++++++++++++++++---------------- > t/t4015-diff-whitespace.sh | 56 ++++++++++++++++ > 2 files changed, 129 insertions(+), 57 deletions(-) > > diff --git a/diff.c b/diff.c > index c378ce3daf..89559293e7 100644 > --- a/diff.c > +++ b/diff.c > @@ -750,6 +750,8 @@ struct emitted_diff_symbol { > const char *line; > int len; > int flags; > + int indent_off; > + int indent_width; So this is the trick how we compute the ws related data only once per line. :-) On the other hand, we do not save memory by disabling the ws detection, but I guess that is not a problem for now. Would it make sense to have the new variables be unsigned? (Also a comment on what they are, I needed to read the code to understand off to be offset into the line, where the content starts, and width to be the visual width, as I did not recall the RFC.) > +static void fill_es_indent_data(struct emitted_diff_symbol *es) > [...] > + if (o->color_moved_ws_handling & > + COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE) > + fill_es_indent_data(&o->emitted_symbols->buf[n]); Nice. By reducing the information kept around to ints, we also do not need to alloc/free memory for each line. > +++ b/t/t4015-diff-whitespace.sh > @@ -1901,4 +1901,60 @@ test_expect_success 'compare whitespace delta incompatible with other space opti > test_i18ngrep allow-indentation-change err > ' > > +test_expect_success 'compare mixed whitespace delta across moved blocks' ' Looks good, Thanks! Stefan
Hi Stefan On 16/11/2018 21:47, Stefan Beller wrote: > On Fri, Nov 16, 2018 at 3:04 AM Phillip Wood <phillip.wood@talktalk.net> wrote: >> >> From: Phillip Wood <phillip.wood@dunelm.org.uk> >> >> Currently diff --color-moved-ws=allow-indentation-change does not >> support indentation that contains a mix of tabs and spaces. For >> example in commit 546f70f377 ("convert.h: drop 'extern' from function >> declaration", 2018-06-30) the function parameters in the following >> lines are not colored as moved [1]. >> >> -extern int stream_filter(struct stream_filter *, >> - const char *input, size_t *isize_p, >> - char *output, size_t *osize_p); >> +int stream_filter(struct stream_filter *, >> + const char *input, size_t *isize_p, >> + char *output, size_t *osize_p); >> >> This commit changes the way the indentation is handled to track the >> visual size of the indentation rather than the characters in the >> indentation. This has they benefit that any whitespace errors do not > > s/they/the/ Thanks, well spotted > >> interfer with the move detection (the whitespace errors will still be >> highlighted according to --ws-error-highlight). During the discussion >> of this feature there were concerns about the correct detection of >> indentation for python. However those concerns apply whether or not >> we're detecting moved lines so no attempt is made to determine if the >> indentation is 'pythonic'. >> >> [1] Note that before the commit to fix the erroneous coloring of moved >> lines each line was colored as a different block, since that commit >> they are uncolored. >> >> Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> >> --- >> >> Notes: >> Changes since rfc: >> - It now replaces the existing implementation rather than adding a new >> mode. >> - The indentation deltas are now calculated once for each line and >> cached. >> - Optimized the whitespace delta comparison to compare string lengths >> before comparing the actual strings. >> - Modified the calculation of tabs as suggested by Stefan. >> - Split out the blank line handling into a separate commit as suggest >> by Stefan. >> - Fixed some comments pointed out by Stefan. >> >> diff.c | 130 +++++++++++++++++++++---------------- >> t/t4015-diff-whitespace.sh | 56 ++++++++++++++++ >> 2 files changed, 129 insertions(+), 57 deletions(-) >> >> diff --git a/diff.c b/diff.c >> index c378ce3daf..89559293e7 100644 >> --- a/diff.c >> +++ b/diff.c >> @@ -750,6 +750,8 @@ struct emitted_diff_symbol { >> const char *line; >> int len; >> int flags; >> + int indent_off; >> + int indent_width; > > So this is the trick how we compute the ws related > data only once per line. :-) > > On the other hand, we do not save memory by disabling > the ws detection, but I guess that is not a problem for now. I did wonder about that, but decided the increase was small compared to all the strings that are copied when creating the emitted_diff_symbols. If we want to save memory then we should stop struct emitted_diff_symbol() from carrying a copy of all the strings. > Would it make sense to have the new variables be > unsigned? (Also a comment on what they are, I > needed to read the code to understand off to be > offset into the line, where the content starts, and > width to be the visual width, as I did not recall > the RFC.) Yes a comment would make sense. I don't think I have a strong preference for signed/unsigned, I can change it if you want. Thanks for looking at these so promptly Best Wishes Phillip >> +static void fill_es_indent_data(struct emitted_diff_symbol *es) >> [...] > >> + if (o->color_moved_ws_handling & >> + COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE) >> + fill_es_indent_data(&o->emitted_symbols->buf[n]); > > Nice. > > By reducing the information kept around to ints, we also do not need > to alloc/free > memory for each line. > >> +++ b/t/t4015-diff-whitespace.sh >> @@ -1901,4 +1901,60 @@ test_expect_success 'compare whitespace delta incompatible with other space opti >> test_i18ngrep allow-indentation-change err >> ' >> >> +test_expect_success 'compare mixed whitespace delta across moved blocks' ' > > Looks good, > > Thanks! > Stefan >
diff --git a/diff.c b/diff.c index c378ce3daf..89559293e7 100644 --- a/diff.c +++ b/diff.c @@ -750,6 +750,8 @@ struct emitted_diff_symbol { const char *line; int len; int flags; + int indent_off; + int indent_width; enum diff_symbol s; }; #define EMITTED_DIFF_SYMBOL_INIT {NULL} @@ -780,44 +782,68 @@ struct moved_entry { struct moved_entry *next_line; }; -/** - * The struct ws_delta holds white space differences between moved lines, i.e. - * between '+' and '-' lines that have been detected to be a move. - * The string contains the difference in leading white spaces, before the - * rest of the line is compared using the white space config for move - * coloring. The current_longer indicates if the first string in the - * comparision is longer than the second. - */ -struct ws_delta { - char *string; - unsigned int current_longer : 1; -}; -#define WS_DELTA_INIT { NULL, 0 } - struct moved_block { struct moved_entry *match; - struct ws_delta wsd; + int wsd; /* The whitespace delta of this block */ }; static void moved_block_clear(struct moved_block *b) { - FREE_AND_NULL(b->wsd.string); - b->match = NULL; + memset(b, 0, sizeof(*b)); +} + +static void fill_es_indent_data(struct emitted_diff_symbol *es) +{ + unsigned int off = 0; + int width = 0, tab_width = es->flags & WS_TAB_WIDTH_MASK; + const char *s = es->line; + const int len = es->len; + + /* skip any \v \f \r at start of indentation */ + while (s[off] == '\f' || s[off] == '\v' || + (s[off] == '\r' && off < len - 1)) + off++; + + /* calculate the visual width of indentation */ + while(1) { + if (s[off] == ' ') { + width++; + off++; + } else if (s[off] == '\t') { + width += tab_width - (width % tab_width); + while (s[++off] == '\t') + width += tab_width; + } else { + break; + } + } + + es->indent_off = off; + es->indent_width = width; } static int compute_ws_delta(const struct emitted_diff_symbol *a, - const struct emitted_diff_symbol *b, - struct ws_delta *out) + const struct emitted_diff_symbol *b, + int *out) { - const struct emitted_diff_symbol *longer = a->len > b->len ? a : b; - const struct emitted_diff_symbol *shorter = a->len > b->len ? b : a; - int d = longer->len - shorter->len; + int a_len = a->len, + b_len = b->len, + a_off = a->indent_off, + a_width = a->indent_width, + b_off = b->indent_off, + b_width = b->indent_width; + int delta; - if (strncmp(longer->line + d, shorter->line, shorter->len)) + if (a->s == DIFF_SYMBOL_PLUS) + delta = a_width - b_width; + else + delta = b_width - a_width; + + if (a_len - a_off != b_len - b_off || + memcmp(a->line + a_off, b->line + b_off, a_len - a_off)) return 0; - out->string = xmemdupz(longer->line, d); - out->current_longer = (a == longer); + *out = delta; return 1; } @@ -833,8 +859,11 @@ static int cmp_in_block_with_wsd(const struct diff_options *o, const char *a = cur->es->line, *b = match->es->line, *c = l->line; - const char *orig_a = a; - int wslen; + int a_off = cur->es->indent_off, + a_width = cur->es->indent_width, + c_off = l->indent_off, + c_width = l->indent_width; + int delta; /* * We need to check if 'cur' is equal to 'match'. As those @@ -848,35 +877,20 @@ static int cmp_in_block_with_wsd(const struct diff_options *o, if (al != bl) return 1; - if (!pmb->wsd.string) - /* - * The white space delta is not active? This can happen - * when we exit early in this function. - */ - return 1; - /* - * The indent changes of the block are known and stored in - * pmb->wsd; however we need to check if the indent changes of the - * current line are still the same as before. - * - * To do so we need to compare 'l' to 'cur', adjusting the - * one of them for the white spaces, depending which was longer. + * The indent changes of the block are known and stored in pmb->wsd; + * however we need to check if the indent changes of the current line + * match those of the current block and that the text of 'l' and 'cur' + * after the indentation match. */ + if (cur->es->s == DIFF_SYMBOL_PLUS) + delta = a_width - c_width; + else + delta = c_width - a_width; - wslen = strlen(pmb->wsd.string); - if (pmb->wsd.current_longer) { - c += wslen; - cl -= wslen; - } else { - a += wslen; - al -= wslen; - } - - if (al != cl || memcmp(orig_a, b, bl) || memcmp(a, c, al)) - return 1; - - return 0; + return !(delta == pmb->wsd && al - a_off == cl - c_off && + !memcmp(a, b, al) && ! + memcmp(a + a_off, c + c_off, al - a_off)); } static int moved_entry_cmp(const void *hashmap_cmp_fn_data, @@ -942,6 +956,9 @@ static void add_lines_to_move_detection(struct diff_options *o, continue; } + if (o->color_moved_ws_handling & + COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE) + fill_es_indent_data(&o->emitted_symbols->buf[n]); key = prepare_entry(o, n); if (prev_line && prev_line->es->s == o->emitted_symbols->buf[n].s) prev_line->next_line = key; @@ -1020,8 +1037,7 @@ static int shrink_potential_moved_blocks(struct moved_block *pmb, if (lp < pmb_nr && rp > -1 && lp < rp) { pmb[lp] = pmb[rp]; - pmb[rp].match = NULL; - pmb[rp].wsd.string = NULL; + memset(&pmb[rp], 0, sizeof(pmb[rp])); rp--; lp++; } @@ -1141,7 +1157,7 @@ static void mark_color_as_moved(struct diff_options *o, &pmb[pmb_nr].wsd)) pmb[pmb_nr++].match = match; } else { - pmb[pmb_nr].wsd.string = NULL; + pmb[pmb_nr].wsd = 0; pmb[pmb_nr++].match = match; } } @@ -1507,7 +1523,7 @@ static void emit_diff_symbol_from_struct(struct diff_options *o, static void emit_diff_symbol(struct diff_options *o, enum diff_symbol s, const char *line, int len, unsigned flags) { - struct emitted_diff_symbol e = {line, len, flags, s}; + struct emitted_diff_symbol e = {line, len, flags, 0, 0, s}; if (o->emitted_symbols) append_emitted_diff_symbol(o, &e); diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh index fe8a2ab06e..e023839ba6 100755 --- a/t/t4015-diff-whitespace.sh +++ b/t/t4015-diff-whitespace.sh @@ -1901,4 +1901,60 @@ test_expect_success 'compare whitespace delta incompatible with other space opti test_i18ngrep allow-indentation-change err ' +test_expect_success 'compare mixed whitespace delta across moved blocks' ' + + git reset --hard && + tr Q_ "\t " <<-EOF >text.txt && + ____Indented text to + _Q____be further indented by four spaces across + ____Qseveral lines + QQ____These two lines have had their + ____indentation reduced by four spaces + Qdifferent indentation change + ____too short + EOF + + git add text.txt && + git commit -m "add text.txt" && + + tr Q_ "\t " <<-EOF >text.txt && + QIndented text to + QQbe further indented by four spaces across + Q____several lines + Q_QThese two lines have had their + indentation reduced by four spaces + QQdifferent indentation change + __Qtoo short + EOF + + git -c color.diff.whitespace="normal red" \ + -c core.whitespace=space-before-tab \ + diff --color --color-moved --ws-error-highlight=all \ + --color-moved-ws=allow-indentation-change >actual.raw && + grep -v "index" actual.raw | test_decode_color >actual && + + cat <<-\EOF >expected && + <BOLD>diff --git a/text.txt b/text.txt<RESET> + <BOLD>--- a/text.txt<RESET> + <BOLD>+++ b/text.txt<RESET> + <CYAN>@@ -1,7 +1,7 @@<RESET> + <BOLD;MAGENTA>-<RESET><BOLD;MAGENTA> Indented text to<RESET> + <BOLD;MAGENTA>-<RESET><BRED> <RESET> <BOLD;MAGENTA> be further indented by four spaces across<RESET> + <BOLD;MAGENTA>-<RESET><BRED> <RESET> <BOLD;MAGENTA>several lines<RESET> + <BOLD;BLUE>-<RESET> <BOLD;BLUE> These two lines have had their<RESET> + <BOLD;BLUE>-<RESET><BOLD;BLUE> indentation reduced by four spaces<RESET> + <BOLD;MAGENTA>-<RESET> <BOLD;MAGENTA>different indentation change<RESET> + <RED>-<RESET><RED> too short<RESET> + <BOLD;CYAN>+<RESET> <BOLD;CYAN>Indented text to<RESET> + <BOLD;CYAN>+<RESET> <BOLD;CYAN>be further indented by four spaces across<RESET> + <BOLD;CYAN>+<RESET> <BOLD;CYAN> several lines<RESET> + <BOLD;YELLOW>+<RESET> <BRED> <RESET> <BOLD;YELLOW>These two lines have had their<RESET> + <BOLD;YELLOW>+<RESET><BOLD;YELLOW>indentation reduced by four spaces<RESET> + <BOLD;CYAN>+<RESET> <BOLD;CYAN>different indentation change<RESET> + <GREEN>+<RESET><BRED> <RESET> <GREEN>too short<RESET> + EOF + + test_cmp expected actual +' + test_done