Message ID | pull.1625.git.1703264469238.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | sideband.c: replace int with size_t for clarity | expand |
On Fri, Dec 22, 2023 at 05:01:09PM +0000, Chandra Pratap via GitGitGadget wrote: > From: Chandra Pratap <chandrapratap3519@gmail.com> > > Replace int with size_t for clarity and remove the > 'NEEDSWORK' tag associated with it. > > Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com> > --- > sideband.c: replace int with size_t for clarity > > Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1625%2FChand-ra%2Fdusra-v1 > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1625/Chand-ra/dusra-v1 > Pull-Request: https://github.com/gitgitgadget/git/pull/1625 > > sideband.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/sideband.c b/sideband.c > index 6cbfd391c47..1599e408d1b 100644 > --- a/sideband.c > +++ b/sideband.c > @@ -69,9 +69,8 @@ void list_config_color_sideband_slots(struct string_list *list, const char *pref > * of the line. This should be called for a single line only, which is > * passed as the first N characters of the SRC array. > * > - * NEEDSWORK: use "size_t n" instead for clarity. > */ > -static void maybe_colorize_sideband(struct strbuf *dest, const char *src, int n) > +static void maybe_colorize_sideband(struct strbuf *dest, const char *src, size_t n) > { > int i; > Thanks for working on this. There is, however, more potential for improvements. First of all: the "int i" from above. Further down, we read for (i = 0; i < ARRAY_SIZE(keywords); i++) { However, a size of an array can never be negative, so that an unsigned data type is a better choice than a signed. And, arrays can have more elements than an int can address, at least in theory. For a reader it makes more sense, to replace int i; with size_t i; And further down, there is another place for improvments: int len = strlen(p->keyword); if (n < len) continue; Even here, a strlen is never negative. And a size_t is the choice for len, since all "modern" implementations declare strlen() to return size_t Do you think that you can have a look at these changes ? I will be happy to do a review, and possibly other people as well.
Thanks for the feedback, Torsten. I was working on improving the rest of sideband.c as you suggested when I encountered this snippet on line 82: while (0 < n && isspace(*src)) { strbuf_addch(dest, *src); src++; n--; } Here, we are decreasing the value of an unsigned type to potentially below 0 which may lead to overflow and result in some nasty bug. In this case, is it wise to continue with replacing 'int n' with 'size_t n' as the NEEDSWORK tag suggests or should we improve upon the rest of the file and revert 'size_t n' to 'int n' ? On Fri, 22 Dec 2023 at 23:27, Torsten Bögershausen <tboegi@web.de> wrote: > > On Fri, Dec 22, 2023 at 05:01:09PM +0000, Chandra Pratap via GitGitGadget wrote: > > From: Chandra Pratap <chandrapratap3519@gmail.com> > > > > Replace int with size_t for clarity and remove the > > 'NEEDSWORK' tag associated with it. > > > > Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com> > > --- > > sideband.c: replace int with size_t for clarity > > > > Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1625%2FChand-ra%2Fdusra-v1 > > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1625/Chand-ra/dusra-v1 > > Pull-Request: https://github.com/gitgitgadget/git/pull/1625 > > > > sideband.c | 3 +-- > > 1 file changed, 1 insertion(+), 2 deletions(-) > > > > diff --git a/sideband.c b/sideband.c > > index 6cbfd391c47..1599e408d1b 100644 > > --- a/sideband.c > > +++ b/sideband.c > > @@ -69,9 +69,8 @@ void list_config_color_sideband_slots(struct string_list *list, const char *pref > > * of the line. This should be called for a single line only, which is > > * passed as the first N characters of the SRC array. > > * > > - * NEEDSWORK: use "size_t n" instead for clarity. > > */ > > -static void maybe_colorize_sideband(struct strbuf *dest, const char *src, int n) > > +static void maybe_colorize_sideband(struct strbuf *dest, const char *src, size_t n) > > { > > int i; > > > > Thanks for working on this. > There is, however, more potential for improvements. > First of all: the "int i" from above. > Further down, we read > for (i = 0; i < ARRAY_SIZE(keywords); i++) { > > However, a size of an array can never be negative, so that > an unsigned data type is a better choice than a signed. > And, arrays can have more elements than an int can address, > at least in theory. > For a reader it makes more sense, to replace > int i; > with > size_t i; > > > And further down, there is another place for improvments: > > int len = strlen(p->keyword); > if (n < len) > continue; > > Even here, a strlen is never negative. And a size_t is the choice for len, > since all "modern" implementations declare strlen() to return size_t > > Do you think that you can have a look at these changes ? > > I will be happy to do a review, and possibly other people as well.
(PLease, avoid top-posting on this list) On Fri, Dec 22, 2023 at 11:57:25PM +0530, Chandra Pratap wrote: > Thanks for the feedback, Torsten. I was working on improving the rest of > sideband.c as you suggested when I encountered this snippet on line 82: > > while (0 < n && isspace(*src)) { > strbuf_addch(dest, *src); > src++; > n--; > } > > Here, we are decreasing the value of an unsigned type to potentially below > 0 which may lead to overflow and result in some nasty bug. In this case, > is it wise to continue with replacing 'int n' with 'size_t n' as the > NEEDSWORK tag suggests or should we improve upon the rest of the file > and revert 'size_t n' to 'int n' ? Yes, that could be a solution. But. In general, we are are striving to use size_t for all objects that live in memory, and that is a long term thing. Careful review is needed, and that is what you just did. If we look at this code again: while (0 < n && isspace(*src)) { strbuf_addch(dest, *src); src++; n--; } When n is signed, it makes sense to use "0 < n". However, if I think about it, it should work for unsigned as well, wouldn't it ? We can leave it as is, or replace it by while (n && isspace(*src)) { strbuf_addch(dest, *src); src++; n--; } > > On Fri, 22 Dec 2023 at 23:27, Torsten Bögershausen <tboegi@web.de> wrote: > > > > On Fri, Dec 22, 2023 at 05:01:09PM +0000, Chandra Pratap via GitGitGadget wrote: > > > From: Chandra Pratap <chandrapratap3519@gmail.com> > > > > > > Replace int with size_t for clarity and remove the > > > 'NEEDSWORK' tag associated with it. > > > > > > Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com> > > > --- > > > sideband.c: replace int with size_t for clarity > > > > > > Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1625%2FChand-ra%2Fdusra-v1 > > > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1625/Chand-ra/dusra-v1 > > > Pull-Request: https://github.com/gitgitgadget/git/pull/1625 > > > > > > sideband.c | 3 +-- > > > 1 file changed, 1 insertion(+), 2 deletions(-) > > > > > > diff --git a/sideband.c b/sideband.c > > > index 6cbfd391c47..1599e408d1b 100644 > > > --- a/sideband.c > > > +++ b/sideband.c > > > @@ -69,9 +69,8 @@ void list_config_color_sideband_slots(struct string_list *list, const char *pref > > > * of the line. This should be called for a single line only, which is > > > * passed as the first N characters of the SRC array. > > > * > > > - * NEEDSWORK: use "size_t n" instead for clarity. > > > */ > > > -static void maybe_colorize_sideband(struct strbuf *dest, const char *src, int n) > > > +static void maybe_colorize_sideband(struct strbuf *dest, const char *src, size_t n) > > > { > > > int i; > > > > > > > Thanks for working on this. > > There is, however, more potential for improvements. > > First of all: the "int i" from above. > > Further down, we read > > for (i = 0; i < ARRAY_SIZE(keywords); i++) { > > > > However, a size of an array can never be negative, so that > > an unsigned data type is a better choice than a signed. > > And, arrays can have more elements than an int can address, > > at least in theory. > > For a reader it makes more sense, to replace > > int i; > > with > > size_t i; > > > > > > And further down, there is another place for improvments: > > > > int len = strlen(p->keyword); > > if (n < len) > > continue; > > > > Even here, a strlen is never negative. And a size_t is the choice for len, > > since all "modern" implementations declare strlen() to return size_t > > > > Do you think that you can have a look at these changes ? > > > > I will be happy to do a review, and possibly other people as well. >
Torsten Bögershausen <tboegi@web.de> writes: Just this part. > Further down, we read > for (i = 0; i < ARRAY_SIZE(keywords); i++) { > > However, a size of an array can never be negative, so that > an unsigned data type is a better choice than a signed. > And, arrays can have more elements than an int can address, > at least in theory. > For a reader it makes more sense, to replace > int i; > with > size_t i; It is a very good discipline to use size_t to index into an array whose size is externally controled (e.g., we slurp what the end user or the server on the other end of the connection gave us into a piece of memory we allocate) to avoid integer overflows as "int" is often narrower than "size_t". But this particular one is a Meh; the keywords[] is a small hardcoded array whose size and contents are totally under our control.
"Chandra Pratap via GitGitGadget" <gitgitgadget@gmail.com> writes: > -static void maybe_colorize_sideband(struct strbuf *dest, const char *src, int n) > +static void maybe_colorize_sideband(struct strbuf *dest, const char *src, size_t n) Changing the type of the paramter alone might be a good start, but does not really help anybody, as (1) the callers are not taught to handle wider integral types for the values they pass and (2) the function uses "int len" it computes internally to compare with "n". There are three callers in demultiplex_sideband(), one of whose paramters is "int len" and is passed to this function in one of these calls. Among the other two, one uses "int linelen" derived from scanning the piece of memory read from sideband via strpbrk(), and the other uses strlen(b) which is the length of the "remainder" of the same buffer after the loop that processes one line at a time using the said strpbrk() is done with the complete lines in the early part. The buffer involved in all of the above stores bytes taken from a packet received via the pkt-line interface, which is capable of transferring only 64kB at a time. I _think_ the most productive use of our time is to replace the NEEDSWORK with a comment saying why it is fine to use "int" here to avoid tempting the next developer to come up with this patch again---they will waste their time to do so without thinking it through if we leave the (incomplete) NEEDSWORK comment, I am afraid.
On Fri, Dec 22, 2023 at 11:01:37AM -0800, Junio C Hamano wrote: > Torsten Bögershausen <tboegi@web.de> writes: > > Just this part. > > > Further down, we read > > for (i = 0; i < ARRAY_SIZE(keywords); i++) { > > > > However, a size of an array can never be negative, so that > > an unsigned data type is a better choice than a signed. > > And, arrays can have more elements than an int can address, > > at least in theory. > > For a reader it makes more sense, to replace > > int i; > > with > > size_t i; > > It is a very good discipline to use size_t to index into an array > whose size is externally controled (e.g., we slurp what the end user > or the server on the other end of the connection gave us into a > piece of memory we allocate) to avoid integer overflows as "int" is > often narrower than "size_t". But this particular one is a Meh; the > keywords[] is a small hardcoded array whose size and contents are > totally under our control. I certainly agree in theory, though I've always erred on the side of always using size_t for indexing into arrays, even if they're small. It removes a potential pitfall if you are working with an externally-controlled array and happen to forget to use size_t. But if there is an existing index variable with type "int", and we can easily validate that it's small, I probably wouldn't bother changing it if I was editing nearby code. Thanks, Taylor
diff --git a/sideband.c b/sideband.c index 6cbfd391c47..1599e408d1b 100644 --- a/sideband.c +++ b/sideband.c @@ -69,9 +69,8 @@ void list_config_color_sideband_slots(struct string_list *list, const char *pref * of the line. This should be called for a single line only, which is * passed as the first N characters of the SRC array. * - * NEEDSWORK: use "size_t n" instead for clarity. */ -static void maybe_colorize_sideband(struct strbuf *dest, const char *src, int n) +static void maybe_colorize_sideband(struct strbuf *dest, const char *src, size_t n) { int i;