Message ID | 20221112143616.1429-1-philipoakley@iee.email (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v4] pretty-formats: add hard truncation, without ellipsis, options | expand |
Philip Oakley <philipoakley@iee.email> writes: > Instead of replacing with "..", replace with the empty string, > implied by passing NULL, and adjust the padding length calculation. What's the point of saying "implied by passing NULL" here? Is it an excuse for passing NULL when passing "" would have sufficed and been more natural, or something? Also, it is unclear to whom you are passing the NULL. I think that it is sufficient that you said "replace with the empty string" there. > Extend the existing tests for these pretty formats to include > `Trunc` and Ltrunc` options matching the `trunc` and `ltrunc` > tests. A more important thing to say is that we add Trunc and Ltrunc, than we test for these new features ;-) You may also want to explain why there is no matching Mtrunc added. I also have another comment on the design. Imagine there are series of wide characters, each occupying two display columns, and you give 6 display columns to truncate such a string into. "trunc" would give you "[][].." (where [] denotes one such wide letter that occupies two display columns), and "Trunc" would give you "[][][]". Now if you give only 5 display columns, to fill instead of 6, what should happen? I do not recall how ".."-stuffed truncation works in this case but it should notice that it cannot stuff 3 wide letters and give you "[][].". The current code may be already buggy, but at least at the design level, it is fairly clear what the feature _should_ do. As a design question, what should "Trunc" do in such a case now? I do not think we can still call it "hard truncate" if the feature gives "[][]" (i.e. fill only 4 display columns, resulting in a string that is not wide enough) or "[][][]" (i.e. exceed 5 columns that are given), but of course chomping a letter in the middle is not acceptable behaviour, so ...
Hi Junio On 21/11/2022 00:34, Junio C Hamano wrote: > Philip Oakley <philipoakley@iee.email> writes: > >> Instead of replacing with "..", replace with the empty string, >> implied by passing NULL, and adjust the padding length calculation. > What's the point of saying "implied by passing NULL" here? Is it an > excuse for passing NULL when passing "" would have sufficed and been > more natural, or something? Passing the empty string was my first approach, however Taylor had commented "why pass the empty string, when NULL will do", hence I checked, and yes, we can pass NULL, so I followed that guidance on the re-roll. > Also, it is unclear to whom you are > passing the NULL. I think that it is sufficient that you said > "replace with the empty string" there. I could drop the commit message comment, and keep the NULL being passed tostrbuf_utf8_replace to indicate the empty string, though that may create the same reviewer question that Taylor had. > >> Extend the existing tests for these pretty formats to include >> `Trunc` and Ltrunc` options matching the `trunc` and `ltrunc` >> tests. > A more important thing to say is that we add Trunc and Ltrunc, than > we test for these new features ;-) That would be to swap the paragraphs about, yes? > > You may also want to explain why there is no matching Mtrunc added. Can do, though it felt obvious that the original mtrunc ellipsis would be necessary for the mid-case. > > I also have another comment on the design. > > Imagine there are series of wide characters, each occupying two > display columns, and you give 6 display columns to truncate such a > string into. "trunc" would give you "[][].." (where [] denotes one > such wide letter that occupies two display columns), and "Trunc" > would give you "[][][]". Now if you give only 5 display columns, > to fill instead of 6, what should happen? My reading of the existing code for ltrunc/mtrunc/trunc was that all these padding conditions were already covered. It was simply a matter of column counting. > > I do not recall how ".."-stuffed truncation works in this case but > it should notice that it cannot stuff 3 wide letters and give you > "[][].". The current code may be already buggy, but at least at the > design level, it is fairly clear what the feature _should_ do. > > As a design question, what should "Trunc" do in such a case now? I > do not think we can still call it "hard truncate" if the feature > gives "[][]" (i.e. fill only 4 display columns, resulting in a > string that is not wide enough) or "[][][]" (i.e. exceed 5 columns > that are given), but of course chomping a letter in the middle is > not acceptable behaviour, so ... The design had already covered those cases. The author already had those thoughts -- Philip
Philip Oakley <philipoakley@iee.email> writes: >> As a design question, what should "Trunc" do in such a case now? I >> do not think we can still call it "hard truncate" if the feature >> gives "[][]" (i.e. fill only 4 display columns, resulting in a >> string that is not wide enough) or "[][][]" (i.e. exceed 5 columns >> that are given), but of course chomping a letter in the middle is >> not acceptable behaviour, so ... > The design had already covered those cases. The author already had those > thoughts Sorry, I was saying that none of * giving only [][] to fill only 4 display columns, without filling the given 5 display columns, * giving [][][] to fill 6 display columns, exceeding the given 5 display columns, * giving [][][ that chomps a letter in the middle, in a failed attempt to fill exactly 5 displya columns. would be a sensible design of the behaviour for "Trunc", so I am not sure what "had already covered" really mean...
On 22/11/2022 00:57, Junio C Hamano wrote: > Philip Oakley <philipoakley@iee.email> writes: > >>> As a design question, what should "Trunc" do in such a case now? I >>> do not think we can still call it "hard truncate" if the feature >>> gives "[][]" (i.e. fill only 4 display columns, resulting in a >>> string that is not wide enough) or "[][][]" (i.e. exceed 5 columns >>> that are given), but of course chomping a letter in the middle is >>> not acceptable behaviour, so ... >> The design had already covered those cases. The author already had those >> thoughts > Sorry, I was saying that none of > > * giving only [][] to fill only 4 display columns, without filling > the given 5 display columns, > > * giving [][][] to fill 6 display columns, exceeding the given 5 > display columns, > > * giving [][][ that chomps a letter in the middle, in a failed > attempt to fill exactly 5 displya columns. > > would be a sensible design of the behaviour for "Trunc", so I am not > sure what "had already covered" really mean... > I'm still unsure what you are trying to say here. Is this a question about the prior `trunc`, `mtrunc`, and `ltrunc` design and tests? e.g. how complete are their tests? Is it looking for an extended explanation of the 'fit in N-columns' design approaches that they all have? I'm happy to add a paragraph saying that an `Mtrunc`, i.e. miss out the middle characters to fit the set column width, without using the two dot ellipsis (`..`), was considered a non starter because of potential confusion when looking at such output when tabulated. The existing code (and tests) already covers the need to hide the characters those two dots (ellipsis) consumed win the N-column tabulated output. The tests also include utf8-encoded characters. All the previous tests for `trunc` and `'ltrunc` (i.e. with ellipsis) have been repeated for the `Trunc` and `Ltrunc` (without ellipsis) hard truncation commands, and their expected outputs updated, including the use of the qz_to_tab_space for those case where trailing spaces are now present. Does that cover your questions? -- Philip
Philip Oakley <philipoakley@iee.email> writes: > On 22/11/2022 00:57, Junio C Hamano wrote: >> Philip Oakley <philipoakley@iee.email> writes: >> >>>> As a design question, what should "Trunc" do in such a case now? I >>>> do not think we can still call it "hard truncate" if the feature >>>> gives "[][]" (i.e. fill only 4 display columns, resulting in a >>>> string that is not wide enough) or "[][][]" (i.e. exceed 5 columns >>>> that are given), but of course chomping a letter in the middle is >>>> not acceptable behaviour, so ... >>> The design had already covered those cases. The author already had those >>> thoughts >> Sorry, I was saying that none of >> >> * giving only [][] to fill only 4 display columns, without filling >> the given 5 display columns, >> >> * giving [][][] to fill 6 display columns, exceeding the given 5 >> display columns, >> >> * giving [][][ that chomps a letter in the middle, in a failed >> attempt to fill exactly 5 displya columns. >> >> would be a sensible design of the behaviour for "Trunc", so I am not >> sure what "had already covered" really mean... >> > I'm still unsure what you are trying to say here. > > Is this a question about the prior `trunc`, `mtrunc`, and `ltrunc` > design and tests? > e.g. how complete are their tests? No. As I said, the existing lowercase ones may already be buggy (I didn't check), in that they may do "[][].." or "[][][]" when told to "trunc" fill a string with four or more double-width letters into a 5 display space. But the point is at least for these with ellipsis it is fairly clear what the desired behaviour is. For "trunc" in the above example, I think the right thing for it to do would be to do "[][].", i.e. consume exactly 5 display columns, and avoid exceeding the given space by not giving two dots but just one. But with "hard truncate", I do not think we can define any sensible behaviour when we ask it to do the same. Giving "[][]" leaves one display space unconsumed and giving "[][][]" would exceed the given space, so anything you would write after that would be unaligned on the line. As to the tests, the question was, whatever the designed behaviour for the above case, if they record the design choice made by this series (even though, as I said, I suspect no design choice for the "hard-fill/trunc odd number of columns with a run of double-width letters" problem is satisfactory).
On 25/11/2022 07:11, Junio C Hamano wrote: > Philip Oakley <philipoakley@iee.email> writes: > >> On 22/11/2022 00:57, Junio C Hamano wrote: >>> Philip Oakley <philipoakley@iee.email> writes: >>> >>>>> As a design question, what should "Trunc" do in such a case now? I >>>>> do not think we can still call it "hard truncate" if the feature >>>>> gives "[][]" (i.e. fill only 4 display columns, resulting in a >>>>> string that is not wide enough) or "[][][]" (i.e. exceed 5 columns >>>>> that are given), but of course chomping a letter in the middle is >>>>> not acceptable behaviour, so ... >>>> The design had already covered those cases. The author already had those >>>> thoughts >>> Sorry, I was saying that none of >>> >>> * giving only [][] to fill only 4 display columns, without filling >>> the given 5 display columns, >>> >>> * giving [][][] to fill 6 display columns, exceeding the given 5 >>> display columns, >>> >>> * giving [][][ that chomps a letter in the middle, in a failed >>> attempt to fill exactly 5 displya columns. >>> >>> would be a sensible design of the behaviour for "Trunc", so I am not >>> sure what "had already covered" really mean... >>> >> I'm still unsure what you are trying to say here. >> >> Is this a question about the prior `trunc`, `mtrunc`, and `ltrunc` >> design and tests? >> e.g. how complete are their tests? > No. As I said, the existing lowercase ones may already be buggy (I > didn't check), That question hadn't come across in the previous emails. > in that they may do "[][].." or "[][][]" when told to > "trunc" fill a string with four or more double-width letters into a > 5 display space. But the point is at least for these with ellipsis > it is fairly clear what the desired behaviour is. That "is fairly clear" is probably the problem. In retrospect it's not clear in the docs that the "%<(N" format is (would appear to be) about defining the display width, in terminal character columns, that the selected parameter is to be displayed within. The code already pads the displayed parameter with spaces as required if the parameter is shorter than the display width - the else condition in pretty.c L1750 > For "trunc" in > the above example, I think the right thing for it to do would be to > do "[][].", i.e. consume exactly 5 display columns, and avoid > exceeding the given space by not giving two dots but just one. The existing choice is padding "[][]" with a single space to reach 5 display chars. For the 6-char "[][][]" truncation it is "[][..", i.e. 3 chars from "[][][]", then the two ".." dots of the ellipsis. > > But with "hard truncate", I do not think we can define any sensible > behaviour Given the existing code and the display column expectation, it felt rather simple to apply the hard cut at the display width, or pad with spaces if the chars to display were shorter than the allocated columns. > when we ask it to do the same. Giving "[][]" leaves one > display space unconsumed and giving "[][][]" would exceed the given > space, so anything you would write after that would be unaligned on > the line. A careful read (and testing) of the existing 'mtrunc' does show a rounding error bug though. Its a confusion between the computed start and end points of the cut that loses one display column (the string displayed is short by one when the count is odd, IIUC). > > As to the tests, the question was, whatever the designed behaviour > for the above case, if they record the design choice made by this > series (even though, as I said, I suspect no design choice for the > "hard-fill/trunc odd number of columns with a run of double-width > letters" problem is satisfactory). The existing tests already cover the trailing space padding issues. However I'd agree that the documentation (and original commit messages for the existing code) do not make clear the distinction between display columns to be filled and characters in the displayed parameters (aka 'placeholders'). Within that, there is also the "%<(N" and "%<|(M" lack of clarity, where the N count is for a single place holder's character count, while the M value is the terminal's display column number, of the 24x80 column style. Finally, the docs use of '<N>' in the middle the place holders title that also use `<` and `>` is a readability nightmare. Given that the text then only talks about `N` it may be sensible to drop the surrounding `<.>` to reduce confusion for these alignment placeholders. -- Philip
On 26/11/2022 14:32, Philip Oakley wrote: > A careful read (and testing) of the existing 'mtrunc' does show a > rounding error bug though. Its a confusion between the computed start > and end points of the cut that loses one display column (the string > displayed is short by one when the count is odd, IIUC). I had confused myself. My test case `git log --graph --format="%<(19,mtrunc)%d=2" -6` had a 2-char step in the graph that I confused with the effects between repeated runs with the 'N' value adjusted by +1 and -1. I then jumped to conclusions about the integer division in the mid string position of the mtrunc case win the code. Sorry for that.
Philip Oakley <philipoakley@iee.email> writes: >> in that they may do "[][].." or "[][][]" when told to >> "trunc" fill a string with four or more double-width letters into a >> 5 display space. But the point is at least for these with ellipsis >> it is fairly clear what the desired behaviour is. > > That "is fairly clear" is probably the problem. In retrospect it's not > clear in the docs that the "%<(N" format is (would appear to be) about > defining the display width, in terminal character columns, that the > selected parameter is to be displayed within. > > The code already pads the displayed parameter with spaces as required if > the parameter is shorter than the display width - the else condition in > pretty.c L1750 > >> For "trunc" in >> the above example, I think the right thing for it to do would be to >> do "[][].", i.e. consume exactly 5 display columns, and avoid >> exceeding the given space by not giving two dots but just one. > > The existing choice is padding "[][]" with a single space to reach 5 > display chars. > For the 6-char "[][][]" truncation it is "[][..", i.e. 3 chars from > "[][][]", then the two ".." dots of the ellipsis. Here, I realize that I did not explain the scenario well. The message you are responding to was meant to be a clarification of my earlier message and it should have done a better job but apparently I failed. Sorry, and let me try again. The single example I meant to use to illustrate the scenario I worry about is this. There is a string, in which there are four (or more) letters, each of which occupies two display columns. And '[]' in my earlier messages stood for a SINGLE such letter (I just wanted to stick to ASCII, instead of using East Asian script, for illustration). So "[][.." is not possible (you are chomping the second such letter in half). I could use East Asian 一二三四 (there are four letters, denoting one, two, three, and four, each occupying two display spaces when typeset in a fixed width font), but to make it easier to see in ASCII only text, let's pretend "[1]", "[2]", "[3]", "[4]" are such letters. You cannot chomp them in the middle (and please pretend each of them occupy two, not three, display spaces). When the given display space is 6 columns, we can fit 2 such letters plus ".." in the space. If the original string were [1][2][3][4], it is clear trunk and ltrunk can do "[1][2].." (remember [n] stands for a single letter whose width is 2 columns, so that takes 6 columns) and "..[3][4]", respectively. It also is clear that Trunk and Ltrunk can do "[1][2][3]" and "[2][3][4]", respectively. We truncate the given string so that we fill the alloted display columns fully. If the given display space is 5 columns, the desirable behaviour for trunk and ltrunk is still clear. Instead of consuming two dots, we could use a single dot as the filler. As I said, I suspect that the implementation of trunk and ltrunc does this correctly, though. My worry is it is not clear what Trunk and Ltrunk should do in that case. There is no way to fit a substring of [1][2][3][4] into 5 columns without any filler.
On 26/11/2022 23:19, Junio C Hamano wrote: > Philip Oakley <philipoakley@iee.email> writes: > >>> in that they may do "[][].." or "[][][]" when told to >>> "trunc" fill a string with four or more double-width letters into a >>> 5 display space. But the point is at least for these with ellipsis >>> it is fairly clear what the desired behaviour is. >> That "is fairly clear" is probably the problem. In retrospect it's not >> clear in the docs that the "%<(N" format is (would appear to be) about >> defining the display width, in terminal character columns, that the >> selected parameter is to be displayed within. >> >> The code already pads the displayed parameter with spaces as required if >> the parameter is shorter than the display width - the else condition in >> pretty.c L1750 >> >>> For "trunc" in >>> the above example, I think the right thing for it to do would be to >>> do "[][].", i.e. consume exactly 5 display columns, and avoid >>> exceeding the given space by not giving two dots but just one. >> The existing choice is padding "[][]" with a single space to reach 5 >> display chars. >> For the 6-char "[][][]" truncation it is "[][..", i.e. 3 chars from >> "[][][]", then the two ".." dots of the ellipsis. > Here, I realize that I did not explain the scenario well. The > message you are responding to was meant to be a clarification of my > earlier message and it should have done a better job but apparently > I failed. Sorry, and let me try again. > > The single example I meant to use to illustrate the scenario I worry > about is this. There is a string, in which there are four (or more) > letters, each of which occupies two display columns. And '[]' in my > earlier messages stood for a SINGLE such letter (I just wanted to > stick to ASCII, instead of using East Asian script, for > illustration). So "[][.." is not possible (you are chomping the > second such letter in half). > > I could use East Asian 一二三四 (there are four letters, denoting > one, two, three, and four, each occupying two display spaces when > typeset in a fixed width font), Thanks for that clarification, I'd been thinking it was about c char (bytes) such as ASCII and multi-byte characters (code points), e.g. European umlaut style distinctions. I hadn't really picked up on the distinction between wide and narrow 'glyphs' (if that's the right term to use). I see that the code does properly count the widths of narrow and wide code points as 1 and 2 columns of the display, but then doesn't explicitly try any adjustment for the wide code point problem you noted. > but to make it easier to see in > ASCII only text, let's pretend "[1]", "[2]", "[3]", "[4]" are such > letters. You cannot chomp them in the middle (and please pretend > each of them occupy two, not three, display spaces). > > When the given display space is 6 columns, we can fit 2 such letters > plus ".." in the space. If the original string were [1][2][3][4], > it is clear trunk and ltrunk can do "[1][2].." (remember [n] stands > for a single letter whose width is 2 columns, so that takes 6 > columns) and "..[3][4]", respectively. It also is clear that Trunk > and Ltrunk can do "[1][2][3]" and "[2][3][4]", respectively. We > truncate the given string so that we fill the alloted display > columns fully. > > If the given display space is 5 columns, the desirable behaviour for > trunk and ltrunk is still clear. Instead of consuming two dots, we > could use a single dot as the filler. As I said, I suspect that the > implementation of trunk and ltrunc does this correctly, though. I believe there is a possible solution that, if we detect a column over-run, then we can go back and replace the current two column double dot with a narrow U+2026 Horizontal ellipsis, to regain the needed column. > > My worry is it is not clear what Trunk and Ltrunk should do in that > case. There is no way to fit a substring of [1][2][3][4] into 5 > columns without any filler. For this case where the final code point overruns, my solution could/would be to use the Vertical ellipsis U+22EE "⋮" to re-write that final character (though the Unicode Replacement Character "�" could be used, but that's ugly) I suspect the code would need some close reading to ensure that the column counting and replacement would correctly cope with the 'off by one' wide width case inside the strbuf_utf8_replace(). I.e. given the same off-by-one position and replacement length, get back to the same point to replace either the double dot or the final code point in an idempotent manner. The logic feels sound, as long as there are no three wide crocodile code-points. Either we counted the right number of columns, or we over-ran by one, so we go back and substitute with a one-for-two replacement. Philip For watchers, https://github.com/microsoft/terminal/issues/4345 shows some of the issues in the general case.
Philip Oakley <philipoakley@iee.email> writes: >> If the given display space is 5 columns, the desirable behaviour for >> trunk and ltrunk is still clear. Instead of consuming two dots, we >> could use a single dot as the filler. As I said, I suspect that the >> implementation of trunk and ltrunc does this correctly, though. > > I believe there is a possible solution that, if we detect a column > over-run, then we can go back and replace the current two column double > dot with a narrow U+2026 Horizontal ellipsis, to regain the needed column. It would work, too. As "trunc" and "ltrunc" (and "mtrunc") are about "truncate and show the filler to indicate some letters in the original are not shown, and make the result exactly N columns", it can use a narrower filler than the originally specified and use the alloted space. >> My worry is it is not clear what Trunk and Ltrunk should do in that >> case. There is no way to fit a substring of [1][2][3][4] into 5 >> columns without any filler. > For this case where the final code point overruns, my solution > could/would be to use the Vertical ellipsis U+22EE "⋮" to re-write that > final character (though the Unicode Replacement Character "�" could be > used, but that's ugly) That would be "trunc" not "Trunc", wouldn't it? If the capital letter verions are "hard truncate" without filler, it fundamentally cannot be done to fill 5 display spaces only with letters each of which take 2 display spaces. I am not saying that there is an impossible situation to handle and "hard truncate" primitives should not be invented. I just want us to specify (and document) what happens in such a case, so that it no longer is an "impossible" situation (we can say "odd/leftover columns are filled with minimum number of SP to make the result N display spaces", for example). And not calling it "hard truncate" might be a good place to start, as it won't be "hard truncate" with such a padding.
Apologies for the delays. On 26/11/2022 23:19, Junio C Hamano wrote: > Philip Oakley <philipoakley@iee.email> writes: > >>> in that they may do "[][].." or "[][][]" when told to >>> "trunc" fill a string with four or more double-width letters into a >>> 5 display space. But the point is at least for these with ellipsis >>> it is fairly clear what the desired behaviour is. >> That "is fairly clear" is probably the problem. In retrospect it's not >> clear in the docs that the "%<(N" format is (would appear to be) about >> defining the display width, in terminal character columns, that the >> selected parameter is to be displayed within. >> >> The code already pads the displayed parameter with spaces as required if >> the parameter is shorter than the display width - the else condition in >> pretty.c L1750 >> >>> For "trunc" in >>> the above example, I think the right thing for it to do would be to >>> do "[][].", i.e. consume exactly 5 display columns, and avoid >>> exceeding the given space by not giving two dots but just one. >> The existing choice is padding "[][]" with a single space to reach 5 >> display chars. >> For the 6-char "[][][]" truncation it is "[][..", i.e. 3 chars from >> "[][][]", then the two ".." dots of the ellipsis. > Here, I realize that I did not explain the scenario well. The > message you are responding to was meant to be a clarification of my > earlier message and it should have done a better job but apparently > I failed. Sorry, and let me try again. > > The single example I meant to use to illustrate the scenario I worry > about is this. There is a string, in which there are four (or more) > letters, each of which occupies two display columns. I hadn't appreciated that utf8 has 'wide' characters that are effectively double width, i.e. use 2 display columns, such as emojis. I had been well aware of the multi-byte nature of utf8, and vaguely aware of the potential for combined characters. > And '[]' in my > earlier messages stood for a SINGLE such letter (I just wanted to > stick to ASCII, instead of using East Asian script, for > illustration). So "[][.." is not possible (you are chomping the > second such letter in half). > > I could use East Asian 一二三四 (there are four letters, denoting > one, two, three, and four, each occupying two display spaces when > typeset in a fixed width font), These 4 characters came through ok. > but to make it easier to see in > ASCII only text, let's pretend "[1]", "[2]", "[3]", "[4]" are such > letters. You cannot chomp them in the middle (and please pretend > each of them occupy two, not three, display spaces). > > When the given display space is 6 columns, we can fit 2 such letters > plus ".." in the space. If the original string were [1][2][3][4], > it is clear trunk and ltrunk can do "[1][2].." > (remember [n] stands > for a single letter whose width is 2 columns, so that takes 6 > columns) Aside: The man pages aren't that clear about the distinction between display columns and characters, both for these width based formats (allow this placeholder parameter a width of <N> columns), and the terminal's column position formats (start this parameter at the on-screen column <N>) . > and "..[3][4]", respectively. It also is clear that Trunk > and Ltrunk can do "[1][2][3]" and "[2][3][4]", respectively. We > truncate the given string so that we fill the alloted display > columns fully. While this example is clear, it's not clear what should be done if we have mixed width strings, e.g. with emojis, as the boundaries in random text will also be randomly placed. > > If the given display space is 5 columns, the desirable behaviour for > trunk and ltrunk is still clear. Instead of consuming two dots, we > could use a single dot as the filler. As I said, I suspect that the > implementation of trunk and ltrunc does this correctly, though. The current implementation is buggy in the case of combining character code points. We forget to add the (zero space) combining code points once we have the base character when it is the character before the split (where the double dot ".." is inserted). I.e. the zero space characters are added after the ".." double dot. This can cause problems with the existing code for `mtrunc` where the 'lost' combing code points then become attached to the preceding "two dots". > > My worry is it is not clear what Trunk and Ltrunk should do in that > case. There is no way to fit a substring of [1][2][3][4] into 5 > columns without any filler. I'm not sure that anyone has fully solved that issue of what to do when a wide character overlaps the end of an available display space, such as terminal word-wrap when resizing the window (in my mintty terminal window it displays a 'space' then starts the wide character on a new line). There's also a 'bug' reported for one of the microsoft terminals that the user wants to position the cursor at a column that is the middle of a wide emoji character and wants half of it over written! For our case (no wordwrap) I'd be minded to to mark the end column with a single width character to show that some (wide) thing should be here, but we've had to cut it off, such as the vertical ellipsis. At least it's explainable. I'll at least work on the doc clarification regarding the column width, column position and wide char (2-col) issue, and hopefully a few failing tests for the combing code point and the wide char fitment issue. -- Philip
Philip Oakley <philipoakley@iee.email> writes: >> and "..[3][4]", respectively. It also is clear that Trunk >> and Ltrunk can do "[1][2][3]" and "[2][3][4]", respectively. We >> truncate the given string so that we fill the alloted display >> columns fully. > > While this example is clear, it's not clear what should be done if we > have mixed width strings, e.g. with emojis, as the boundaries in random > text will also be randomly placed. As long as wider letters have widths that is integral of the narrowest letters (ASCII?), "use N columns, padding with '.' if needed" has a reasonable solution, no? "[1]A[2]" occupies 2+1+2 columns, so trunc that is given only 3 (or 4) columns can drop the last "[2]" and fit "[1]A" in the given columns with padding. > I'll at least work on the doc clarification regarding the column width, > column position and wide char (2-col) issue, and hopefully a few failing > tests for the combing code point and the wide char fitment issue. Thanks, that would give us a very good starting point.
diff --git a/Documentation/pretty-formats.txt b/Documentation/pretty-formats.txt index 0b4c1c8d98a..0bd339db333 100644 --- a/Documentation/pretty-formats.txt +++ b/Documentation/pretty-formats.txt @@ -146,14 +146,15 @@ The placeholders are: '%m':: left (`<`), right (`>`) or boundary (`-`) mark '%w([<w>[,<i1>[,<i2>]]])':: switch line wrapping, like the -w option of linkgit:git-shortlog[1]. -'%<(<N>[,trunc|ltrunc|mtrunc])':: make the next placeholder take at +'%<(<N>[,trunc|ltrunc|mtrunc|Trunc|Ltrunc])':: make the next placeholder take at least N columns, padding spaces on the right if necessary. Optionally - truncate at the beginning (ltrunc), - the middle (mtrunc) or the end - (trunc) if the output is longer than - N columns. Note that truncating + truncate (with ellipsis '..') at the beginning (ltrunc) `..ft`, + the middle (mtrunc) `mi..le` or the end + (trunc) `rig..` if the output is longer than + N columns. Note that truncating with ellipsis only works correctly with N >= 2. + Use (Trunc|Ltrunc) for hard truncation without ellipsis. '%<|(<N>)':: make the next placeholder take at least until Nth columns, padding spaces on the right if necessary '%>(<N>)', '%>|(<N>)':: similar to '%<(<N>)', '%<|(<N>)' respectively, diff --git a/pretty.c b/pretty.c index 6cb363ae1c9..1e873d3f41a 100644 --- a/pretty.c +++ b/pretty.c @@ -857,7 +857,9 @@ enum trunc_type { trunc_none, trunc_left, trunc_middle, - trunc_right + trunc_right, + trunc_left_hard, + trunc_right_hard }; struct format_commit_context { @@ -1145,6 +1147,10 @@ static size_t parse_padding_placeholder(const char *placeholder, c->truncate = trunc_left; else if (starts_with(start, "mtrunc)")) c->truncate = trunc_middle; + else if (starts_with(start, "Ltrunc)")) + c->truncate = trunc_left_hard; + else if (starts_with(start, "Trunc)")) + c->truncate = trunc_right_hard; else return 0; } else @@ -1743,6 +1749,16 @@ static size_t format_and_pad_commit(struct strbuf *sb, /* in UTF-8 */ padding - 2, len - (padding - 2), ".."); break; + case trunc_left_hard: + strbuf_utf8_replace(&local_sb, + 0, len - padding, + NULL); + break; + case trunc_right_hard: + strbuf_utf8_replace(&local_sb, + padding, len - padding, + NULL); + break; case trunc_none: break; } diff --git a/t/t4205-log-pretty-formats.sh b/t/t4205-log-pretty-formats.sh index e448ef2928a..55dca307983 100755 --- a/t/t4205-log-pretty-formats.sh +++ b/t/t4205-log-pretty-formats.sh @@ -261,6 +261,17 @@ test_expect_success 'left alignment formatting with trunc' ' test_cmp expected actual ' +test_expect_success 'left alignment formatting with Trunc' ' + git log --pretty="tformat:%<(10,Trunc)%s" >actual && + qz_to_tab_space <<-\EOF >expected && + message tw + message on + add bar Z + initial. a + EOF + test_cmp expected actual +' + test_expect_success 'left alignment formatting with trunc. i18n.logOutputEncoding' ' git -c i18n.logOutputEncoding=$test_encoding log --pretty="tformat:%<(10,trunc)%s" >actual && qz_to_tab_space <<-\EOF | iconv -f utf-8 -t $test_encoding >expected && @@ -272,6 +283,17 @@ test_expect_success 'left alignment formatting with trunc. i18n.logOutputEncodin test_cmp expected actual ' +test_expect_success 'left alignment formatting with Trunc. i18n.logOutputEncoding' ' + git -c i18n.logOutputEncoding=$test_encoding log --pretty="tformat:%<(10,Trunc)%s" >actual && + qz_to_tab_space <<-\EOF | iconv -f utf-8 -t $test_encoding >expected && + message tw + message on + add bar Z + initial. a + EOF + test_cmp expected actual +' + test_expect_success 'left alignment formatting with ltrunc' ' git log --pretty="tformat:%<(10,ltrunc)%s" >actual && qz_to_tab_space <<-EOF >expected && @@ -283,6 +305,18 @@ test_expect_success 'left alignment formatting with ltrunc' ' test_cmp expected actual ' +test_expect_success 'left alignment formatting with Ltrunc' ' + git log --pretty="tformat:%<(10,Ltrunc)%s" >actual && + qz_to_tab_space <<-EOF >expected && + essage two + essage one + add bar Z + an${sample_utf8_part}lich + EOF + test_cmp expected actual +' + + test_expect_success 'left alignment formatting with ltrunc. i18n.logOutputEncoding' ' git -c i18n.logOutputEncoding=$test_encoding log --pretty="tformat:%<(10,ltrunc)%s" >actual && qz_to_tab_space <<-EOF | iconv -f utf-8 -t $test_encoding >expected && @@ -294,6 +328,16 @@ test_expect_success 'left alignment formatting with ltrunc. i18n.logOutputEncodi test_cmp expected actual ' +test_expect_success 'left alignment formatting with Ltrunc. i18n.logOutputEncoding' ' + git -c i18n.logOutputEncoding=$test_encoding log --pretty="tformat:%<(10,Ltrunc)%s" >actual && + qz_to_tab_space <<-EOF | iconv -f utf-8 -t $test_encoding >expected && + essage two + essage one + add bar Z + an${sample_utf8_part}lich + EOF + test_cmp expected actual +' test_expect_success 'left alignment formatting with mtrunc' ' git log --pretty="tformat:%<(10,mtrunc)%s" >actual && qz_to_tab_space <<-\EOF >expected && @@ -507,6 +551,19 @@ test_expect_success 'left/right alignment formatting with stealing' ' EOF test_cmp expected actual ' + +test_expect_success 'left/right hard alignment formatting with stealing' ' + git commit --amend -m short --author "long long long <long@me.com>" && + git log --pretty="tformat:%<(10,Trunc)%s%>>(10,Ltrunc)% an" >actual && + cat <<-\EOF >expected && + short long long long + message on A U Thor + add bar A U Thor + initial. a A U Thor + EOF + test_cmp expected actual +' + test_expect_success 'left/right alignment formatting with stealing. i18n.logOutputEncoding' ' git -c i18n.logOutputEncoding=$test_encoding log --pretty="tformat:%<(10,trunc)%s%>>(10,ltrunc)% an" >actual && cat <<-\EOF | iconv -f utf-8 -t $test_encoding >expected && @@ -517,6 +574,16 @@ test_expect_success 'left/right alignment formatting with stealing. i18n.logOutp EOF test_cmp expected actual ' +test_expect_success 'left/right alignment formatting with stealing. i18n.logOutputEncoding' ' + git -c i18n.logOutputEncoding=$test_encoding log --pretty="tformat:%<(10,Trunc)%s%>>(10,Ltrunc)% an" >actual && + cat <<-\EOF | iconv -f utf-8 -t $test_encoding >expected && + short long long long + message on A U Thor + add bar A U Thor + initial. a A U Thor + EOF + test_cmp expected actual +' test_expect_success 'strbuf_utf8_replace() not producing NUL' ' git log --color --pretty="tformat:%<(10,trunc)%s%>>(10,ltrunc)%C(auto)%d" | diff --git a/t/t6006-rev-list-format.sh b/t/t6006-rev-list-format.sh index 41d0ca00b1c..c44935b0ab4 100755 --- a/t/t6006-rev-list-format.sh +++ b/t/t6006-rev-list-format.sh @@ -218,6 +218,13 @@ commit $head1 added (hinzugef${added_utf8_part}gt.. EOF +test_format subject-truncated "%<($truncate_count,Trunc)%s" qz_to_tab_space <<EOF +commit $head2 +changed (ge${changed_utf8_part}ndert) f +commit $head1 +added (hinzugef${added_utf8_part}gt)Z +EOF + test_format body %b <<EOF commit $head2 commit $head1 @@ -400,6 +407,15 @@ commit $head1 added (hinzugef${added_utf8_part_iso88591}gt.. EOF +test_format complex-subject-Trunc "%<($truncate_count,Trunc)%s" qz_to_tab_space <<EOF +commit $head3 +Test printing of com +commit $head2 +changed (ge${changed_utf8_part_iso88591}ndert) f +commit $head1 +added (hinzugef${added_utf8_part_iso88591}gt)Z +EOF + test_format complex-subject-mtrunc "%<($truncate_count,mtrunc)%s" <<EOF commit $head3 Test prin..ex bodies @@ -418,6 +434,14 @@ commit $head1 .. (hinzugef${added_utf8_part_iso88591}gt) foo EOF +test_format complex-subject-Ltrunc "%<($truncate_count,Ltrunc)%s" <<EOF +commit $head3 +ng of complex bodies +commit $head2 +anged (ge${changed_utf8_part_iso88591}ndert) foo +commit $head1 +ed (hinzugef${added_utf8_part_iso88591}gt) foo +EOF test_expect_success 'setup expected messages (for test %b)' ' cat <<-EOF >expected.utf-8 && commit $head3 @@ -455,6 +479,15 @@ commit $head1 added (hinzugef${added_utf8_part}gt.. EOF +test_format complex-subject-commitencoding-unset-Trunc "%<($truncate_count,Trunc)%s" qz_to_tab_space <<EOF +commit $head3 +Test printing of com +commit $head2 +changed (ge${changed_utf8_part}ndert) f +commit $head1 +added (hinzugef${added_utf8_part}gt)Z +EOF + test_format complex-subject-commitencoding-unset-mtrunc "%<($truncate_count,mtrunc)%s" <<EOF commit $head3 Test prin..ex bodies @@ -473,6 +506,15 @@ commit $head1 .. (hinzugef${added_utf8_part}gt) foo EOF +test_format complex-subject-commitencoding-unset-Ltrunc "%<($truncate_count,Ltrunc)%s" <<EOF +commit $head3 +ng of complex bodies +commit $head2 +anged (ge${changed_utf8_part}ndert) foo +commit $head1 +ed (hinzugef${added_utf8_part}gt) foo +EOF + test_format complex-body-commitencoding-unset %b <expected.utf-8 test_expect_success '%x00 shows NUL' '
Instead of replacing with "..", replace with the empty string, implied by passing NULL, and adjust the padding length calculation. Extend the existing tests for these pretty formats to include `Trunc` and Ltrunc` options matching the `trunc` and `ltrunc` tests. Signed-off-by: Philip Oakley <philipoakley@iee.email> --- Removed left over comments from locations that had needed QZ 'space' conversion in the here-docs. To: GitList <git@vger.kernel.org> Cc: Taylor Blau <me@ttaylorr.com> Cc: Junio C Hamano <gitster@pobox.com> Cc: NSENGIYUMVA WILBERFORCE <nsengiyumvawilberforce@gmail.com> In-reply-to: <20221102120853.2013-1-philipoakley@iee.email> This should complete the patch (cf. [1] <Y2rPAGp96IwrLS1T@nand.local>) Note: latest What's Cooking <Y2riRSL+NprJt278@nand.local> hadn't been updated. Tests are included. A final review would be welcomed. Range-diff against v3: 1: 498439f0375 ! 1: d26dabc5271 pretty-formats: add hard truncation, without ellipsis, options @@ t/t6006-rev-list-format.sh: commit $head1 added (hinzugef${added_utf8_part}gt.. EOF -+# ZZZ for a space? +test_format subject-truncated "%<($truncate_count,Trunc)%s" qz_to_tab_space <<EOF +commit $head2 +changed (ge${changed_utf8_part}ndert) f @@ t/t6006-rev-list-format.sh: commit $head1 added (hinzugef${added_utf8_part_iso88591}gt.. EOF -+# need qz qz_to_tab_space +test_format complex-subject-Trunc "%<($truncate_count,Trunc)%s" qz_to_tab_space <<EOF +commit $head3 +Test printing of com @@ t/t6006-rev-list-format.sh: commit $head1 added (hinzugef${added_utf8_part}gt.. EOF -+# need qz_to_tab_space +test_format complex-subject-commitencoding-unset-Trunc "%<($truncate_count,Trunc)%s" qz_to_tab_space <<EOF +commit $head3 +Test printing of com Documentation/pretty-formats.txt | 11 +++--- pretty.c | 18 ++++++++- t/t4205-log-pretty-formats.sh | 67 ++++++++++++++++++++++++++++++++ t/t6006-rev-list-format.sh | 42 ++++++++++++++++++++ 4 files changed, 132 insertions(+), 6 deletions(-)