Message ID | 20191025025129.250049-5-emilyshaffer@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | add git-bugreport tool | expand |
Hi Emily, On Thu, 24 Oct 2019, Emily Shaffer wrote: > Teach bugreport to gather the values of config options which are present > in 'git-bugreport-config-whitelist'. > > Many config options are sensitive, and many Git add-ons use config > options which git-core does not know about; it is better only to gather > config options which we know to be safe, rather than excluding options > which we know to be unsafe. Should we still have the `// bugreport-exclude` comments, then? > > Reading the whitelist into memory and sorting it saves us time - > since git_config_bugreport() is called for every option the user has > configured, limiting the file IO to one open/read/close and performing > option lookup in sublinear time is a useful optimization. Maybe we even want a hashmap? That would reduce the time complexity even further. > diff --git a/bugreport.c b/bugreport.c > index ada54fe583..afa4836ab1 100644 > --- a/bugreport.c > +++ b/bugreport.c > @@ -1,10 +1,24 @@ > #include "cache.h" > > #include "bugreport.h" > +#include "config.h" > +#include "exec-cmd.h" > #include "help.h" > #include "run-command.h" > #include "version.h" > > +/** > + * A sorted list of config options which we will add to the bugreport. Managed > + * by 'gather_whitelist(...)'. > + */ > +struct string_list whitelist = STRING_LIST_INIT_DUP; > +struct strbuf configs_and_values = STRBUF_INIT; > + > +// git version --build-options > +// uname -a > +// curl-config --version > +// ldd --version > +// echo $SHELL These comments probably want to move to a single, C style comment, and they probably want to be introduced together with `get_system_info()`. I also have to admit that I might have missed where `$SHELL` was added to the output... > void get_system_info(struct strbuf *sys_info) > { > struct child_process cp = CHILD_PROCESS_INIT; > @@ -53,3 +67,39 @@ void get_system_info(struct strbuf *sys_info) > argv_array_clear(&cp.args); > strbuf_reset(&std_out); > } > + > +void gather_whitelist(struct strbuf *path) > +{ > + struct strbuf tmp = STRBUF_INIT; > + strbuf_read_file(&tmp, path->buf, 0); > + string_list_init(&whitelist, 1); > + string_list_split(&whitelist, tmp.buf, '\n', -1); > + string_list_sort(&whitelist); > +} > + > +int git_config_bugreport(const char *var, const char *value, void *cb) > +{ > + if (string_list_has_string(&whitelist, var)) { > + strbuf_addf(&configs_and_values, > + "%s : %s\n", > + var, value); A quite useful piece of information would be the config source. Not sure whether we can do that outside of `config.c` yet... > + } > + > + return 0; > +} > + > +void get_whitelisted_config(struct strbuf *config_info) > +{ > + struct strbuf path = STRBUF_INIT; > + > + strbuf_addstr(&path, git_exec_path()); > + strbuf_addstr(&path, "/git-bugreport-config-whitelist"); Hmm. I would have expected this patch to come directly after the patch 2/9 that generates that white-list, and I would also have expected that to be pre-sorted, and compiled in. Do you want users to _edit_ the file in the exec path? In general, that path will be write-protected, though. A better alternative would probably be to compile in a hard-coded list, and to allow including more values e.g. by offering command-line options to specify config setting patterns. But if we allow patterns, we might actually want to have those exclusions to prevent sensitive data from being included. > + > + gather_whitelist(&path); > + strbuf_init(&configs_and_values, whitelist.nr); > + > + git_config(git_config_bugreport, NULL); > + > + strbuf_reset(config_info); > + strbuf_addbuf(config_info, &configs_and_values); > +} > diff --git a/bugreport.h b/bugreport.h > index ba216acf3f..7413e7e1be 100644 > --- a/bugreport.h > +++ b/bugreport.h > @@ -5,3 +5,10 @@ > * The previous contents of sys_info will be discarded. > */ > void get_system_info(struct strbuf *sys_info); > + > +/** I also frequently use JavaDoc-style `/**`, but I am not sure that this is actually desired in Git's source code ;-) > + * Adds the values of the config items listed in > + * 'git-bugreport-config-whitelist' to config_info. The previous contents of > + * config_info will be discarded. > + */ > +void get_whitelisted_config(struct strbuf *sys_info); > diff --git a/builtin/bugreport.c b/builtin/bugreport.c > index 7232d31be7..70fe0d2b85 100644 > --- a/builtin/bugreport.c > +++ b/builtin/bugreport.c > @@ -56,6 +56,10 @@ int cmd_bugreport(int argc, const char **argv, const char *prefix) > get_system_info(&buffer); > strbuf_write(&buffer, report); > > + add_header(report, "Whitelisted Config"); Quite honestly, I would like to avoid the term "whitelist" for good. How about "Selected config settings" instead? Thanks, Dscho > + get_whitelisted_config(&buffer); > + strbuf_write(&buffer, report); > + > fclose(report); > > launch_editor(report_path.buf, NULL, NULL); > -- > 2.24.0.rc0.303.g954a862665-goog > >
On 2019.10.24 19:51, Emily Shaffer wrote: [snip] > diff --git a/bugreport.c b/bugreport.c > index ada54fe583..afa4836ab1 100644 > --- a/bugreport.c > +++ b/bugreport.c > @@ -1,10 +1,24 @@ > #include "cache.h" > > #include "bugreport.h" > +#include "config.h" > +#include "exec-cmd.h" > #include "help.h" > #include "run-command.h" > #include "version.h" > > +/** > + * A sorted list of config options which we will add to the bugreport. Managed > + * by 'gather_whitelist(...)'. > + */ > +struct string_list whitelist = STRING_LIST_INIT_DUP; > +struct strbuf configs_and_values = STRBUF_INIT; > + > +// git version --build-options > +// uname -a > +// curl-config --version > +// ldd --version > +// echo $SHELL > void get_system_info(struct strbuf *sys_info) > { > struct child_process cp = CHILD_PROCESS_INIT; > @@ -53,3 +67,39 @@ void get_system_info(struct strbuf *sys_info) > argv_array_clear(&cp.args); > strbuf_reset(&std_out); > } > + > +void gather_whitelist(struct strbuf *path) This and git_config_bugreport() below should both be static as well. Rather than repeating advice on the later patches, I'll just note that any new functions that don't show up in the corresponding .h file should be marked static.
Josh Steadmon <steadmon@google.com> writes: > This and git_config_bugreport() below should both be static as well. > Rather than repeating advice on the later patches, I'll just note that > any new functions that don't show up in the corresponding .h file should > be marked static. Good advice. More importantly, given that "git bugreport" itself has no service of itself to offer other existing parts of Git (it is just a "gather various pieces of information from different places, and then produce a text file output" application), I do not see much point in it having its own header file that others would #include (i.e. the include file is to define services that are offered by it). If there are common enough service routines invented to support the need of bugreport.c (e.g. perhaps it wants to give more info than what is currently available via the existing API on the contents of in-core index), by definition of being them common enough, they should be added to the header that can be used by both bugreport.c and other existing users of the same subsystem (e.g. if it is about in-core index, perhaps cache.h). It makes perfect sense for bugreport.c to #include header files for the Git internals to collect pieces of information from inside Git, though.
On Wed, Oct 30, 2019 at 10:37:13AM +0900, Junio C Hamano wrote: > Josh Steadmon <steadmon@google.com> writes: > > > This and git_config_bugreport() below should both be static as well. > > Rather than repeating advice on the later patches, I'll just note that > > any new functions that don't show up in the corresponding .h file should > > be marked static. > > Good advice. > > More importantly, given that "git bugreport" itself has no service > of itself to offer other existing parts of Git (it is just a "gather > various pieces of information from different places, and then > produce a text file output" application), I do not see much point in > it having its own header file that others would #include (i.e. the > include file is to define services that are offered by it). If > there are common enough service routines invented to support the > need of bugreport.c (e.g. perhaps it wants to give more info than > what is currently available via the existing API on the contents of > in-core index), by definition of being them common enough, they > should be added to the header that can be used by both bugreport.c > and other existing users of the same subsystem (e.g. if it is about > in-core index, perhaps cache.h). Are you asking me to have only builtin/bugreport.c and implement each function there, and eliminate both <basedir>/bugreport.[ch]? It may be true that deciding to put this functionality into a library "in case someone needs it later" was premature, so I can do that - I just want to make sure I understand you. - Emily
On Mon, Oct 28, 2019 at 03:14:35PM +0100, Johannes Schindelin wrote: > Hi Emily, Sorry for the delay in replying. This work has been backburnered and this mail slipped through the cracks. > > On Thu, 24 Oct 2019, Emily Shaffer wrote: > > > Teach bugreport to gather the values of config options which are present > > in 'git-bugreport-config-whitelist'. > > > > Many config options are sensitive, and many Git add-ons use config > > options which git-core does not know about; it is better only to gather > > config options which we know to be safe, rather than excluding options > > which we know to be unsafe. > > Should we still have the `// bugreport-exclude` comments, then? They were optional (useless) before too. I can remove them if you want; I suppose I like the idea of having precedent if someone wants to build their own internal version with opt-out configs rather than opt-in. I can remove them if we want; it doesn't matter very much to me either way. > > > > > Reading the whitelist into memory and sorting it saves us time - > > since git_config_bugreport() is called for every option the user has > > configured, limiting the file IO to one open/read/close and performing > > option lookup in sublinear time is a useful optimization. > > Maybe we even want a hashmap? That would reduce the time complexity even > further. Sure, we can do it. I'll make that change. > > > diff --git a/bugreport.c b/bugreport.c > > index ada54fe583..afa4836ab1 100644 > > --- a/bugreport.c > > +++ b/bugreport.c > > @@ -1,10 +1,24 @@ > > #include "cache.h" > > > > #include "bugreport.h" > > +#include "config.h" > > +#include "exec-cmd.h" > > #include "help.h" > > #include "run-command.h" > > #include "version.h" > > > > +/** > > + * A sorted list of config options which we will add to the bugreport. Managed > > + * by 'gather_whitelist(...)'. > > + */ > > +struct string_list whitelist = STRING_LIST_INIT_DUP; > > +struct strbuf configs_and_values = STRBUF_INIT; > > + > > +// git version --build-options > > +// uname -a > > +// curl-config --version > > +// ldd --version > > +// echo $SHELL > > These comments probably want to move to a single, C style comment, and > they probably want to be introduced together with `get_system_info()`. Yeah, it's stale and has been removed now. It was less commentary and more todo list for author ;) > > I also have to admit that I might have missed where `$SHELL` was added > to the output... I skipped it entirely since bugreport doesn't run in shell anymore. If you have advice for gathering the user's shell I can try to add it; is there such a difference between, say, a Debian user using bash and a Debian user using zsh? I suppose it could be useful if someone has an issue with GIT_PS1, or with autocompletion. I'll look into gathering it. > > > void get_system_info(struct strbuf *sys_info) > > { > > struct child_process cp = CHILD_PROCESS_INIT; > > @@ -53,3 +67,39 @@ void get_system_info(struct strbuf *sys_info) > > argv_array_clear(&cp.args); > > strbuf_reset(&std_out); > > } > > + > > +void gather_whitelist(struct strbuf *path) > > +{ > > + struct strbuf tmp = STRBUF_INIT; > > + strbuf_read_file(&tmp, path->buf, 0); > > + string_list_init(&whitelist, 1); > > + string_list_split(&whitelist, tmp.buf, '\n', -1); > > + string_list_sort(&whitelist); > > +} > > + > > +int git_config_bugreport(const char *var, const char *value, void *cb) > > +{ > > + if (string_list_has_string(&whitelist, var)) { > > + strbuf_addf(&configs_and_values, > > + "%s : %s\n", > > + var, value); > > A quite useful piece of information would be the config source. Not sure > whether we can do that outside of `config.c` yet... It's possible. I can add it. > > > + } > > + > > + return 0; > > +} > > + > > +void get_whitelisted_config(struct strbuf *config_info) > > +{ > > + struct strbuf path = STRBUF_INIT; > > + > > + strbuf_addstr(&path, git_exec_path()); > > + strbuf_addstr(&path, "/git-bugreport-config-whitelist"); > > Hmm. I would have expected this patch to come directly after the patch > 2/9 that generates that white-list, and I would also have expected that > to be pre-sorted, and compiled in. > > Do you want users to _edit_ the file in the exec path? In general, that > path will be write-protected, though. A better alternative would > probably be to compile in a hard-coded list, and to allow including more > values e.g. by offering command-line options to specify config setting > patterns. But if we allow patterns, we might actually want to have those > exclusions to prevent sensitive data from being included. Hm, interesting. Do we have precedent for compiling in a header generated during the build process? I think I saw one when I was adding this script - I'll take a look. > > > + > > + gather_whitelist(&path); > > + strbuf_init(&configs_and_values, whitelist.nr); > > + > > + git_config(git_config_bugreport, NULL); > > + > > + strbuf_reset(config_info); > > + strbuf_addbuf(config_info, &configs_and_values); > > +} > > diff --git a/bugreport.h b/bugreport.h > > index ba216acf3f..7413e7e1be 100644 > > --- a/bugreport.h > > +++ b/bugreport.h > > @@ -5,3 +5,10 @@ > > * The previous contents of sys_info will be discarded. > > */ > > void get_system_info(struct strbuf *sys_info); > > + > > +/** > > I also frequently use JavaDoc-style `/**`, but I am not sure that this > is actually desired in Git's source code ;-) > > > + * Adds the values of the config items listed in > > + * 'git-bugreport-config-whitelist' to config_info. The previous contents of > > + * config_info will be discarded. > > + */ > > +void get_whitelisted_config(struct strbuf *sys_info); > > diff --git a/builtin/bugreport.c b/builtin/bugreport.c > > index 7232d31be7..70fe0d2b85 100644 > > --- a/builtin/bugreport.c > > +++ b/builtin/bugreport.c > > @@ -56,6 +56,10 @@ int cmd_bugreport(int argc, const char **argv, const char *prefix) > > get_system_info(&buffer); > > strbuf_write(&buffer, report); > > > > + add_header(report, "Whitelisted Config"); > > Quite honestly, I would like to avoid the term "whitelist" for good. How > about "Selected config settings" instead? Will do - thanks for the callout. > > Thanks, > Dscho > > > + get_whitelisted_config(&buffer); > > + strbuf_write(&buffer, report); > > + > > fclose(report); > > > > launch_editor(report_path.buf, NULL, NULL); > > -- > > 2.24.0.rc0.303.g954a862665-goog > > > >
Hi Emily, On Wed, 11 Dec 2019, Emily Shaffer wrote: > On Mon, Oct 28, 2019 at 03:14:35PM +0100, Johannes Schindelin wrote: > > > > > On Thu, 24 Oct 2019, Emily Shaffer wrote: > > > > > Teach bugreport to gather the values of config options which are > > > present in 'git-bugreport-config-whitelist'. > > > > > > Many config options are sensitive, and many Git add-ons use config > > > options which git-core does not know about; it is better only to > > > gather config options which we know to be safe, rather than > > > excluding options which we know to be unsafe. > > > > Should we still have the `// bugreport-exclude` comments, then? > > They were optional (useless) before too. I can remove them if you want; > I suppose I like the idea of having precedent if someone wants to build > their own internal version with opt-out configs rather than opt-in. I > can remove them if we want; it doesn't matter very much to me either > way. How are you guaranteeing that this information does not become stale, like, immediately? If you _still_ insist on keeping those comments even after answering this question, at least please turn them into C-style comments: we have no C++-style comments in Git and want to keep it this way. > > > diff --git a/bugreport.c b/bugreport.c > > > [...] > > > + } > > > + > > > + return 0; > > > +} > > > + > > > +void get_whitelisted_config(struct strbuf *config_info) > > > +{ > > > + struct strbuf path = STRBUF_INIT; > > > + > > > + strbuf_addstr(&path, git_exec_path()); > > > + strbuf_addstr(&path, "/git-bugreport-config-whitelist"); > > > > Hmm. I would have expected this patch to come directly after the patch > > 2/9 that generates that white-list, and I would also have expected that > > to be pre-sorted, and compiled in. > > > > Do you want users to _edit_ the file in the exec path? In general, that > > path will be write-protected, though. A better alternative would > > probably be to compile in a hard-coded list, and to allow including more > > values e.g. by offering command-line options to specify config setting > > patterns. But if we allow patterns, we might actually want to have those > > exclusions to prevent sensitive data from being included. > > Hm, interesting. Do we have precedent for compiling in a header > generated during the build process? I think I saw one when I was adding > this script - I'll take a look. Yes, we do. The entire `command-list.h` is generated during the build. Look for `GENERATED_H` in the `Makefile`. Ciao, Dscho
diff --git a/bugreport.c b/bugreport.c index ada54fe583..afa4836ab1 100644 --- a/bugreport.c +++ b/bugreport.c @@ -1,10 +1,24 @@ #include "cache.h" #include "bugreport.h" +#include "config.h" +#include "exec-cmd.h" #include "help.h" #include "run-command.h" #include "version.h" +/** + * A sorted list of config options which we will add to the bugreport. Managed + * by 'gather_whitelist(...)'. + */ +struct string_list whitelist = STRING_LIST_INIT_DUP; +struct strbuf configs_and_values = STRBUF_INIT; + +// git version --build-options +// uname -a +// curl-config --version +// ldd --version +// echo $SHELL void get_system_info(struct strbuf *sys_info) { struct child_process cp = CHILD_PROCESS_INIT; @@ -53,3 +67,39 @@ void get_system_info(struct strbuf *sys_info) argv_array_clear(&cp.args); strbuf_reset(&std_out); } + +void gather_whitelist(struct strbuf *path) +{ + struct strbuf tmp = STRBUF_INIT; + strbuf_read_file(&tmp, path->buf, 0); + string_list_init(&whitelist, 1); + string_list_split(&whitelist, tmp.buf, '\n', -1); + string_list_sort(&whitelist); +} + +int git_config_bugreport(const char *var, const char *value, void *cb) +{ + if (string_list_has_string(&whitelist, var)) { + strbuf_addf(&configs_and_values, + "%s : %s\n", + var, value); + } + + return 0; +} + +void get_whitelisted_config(struct strbuf *config_info) +{ + struct strbuf path = STRBUF_INIT; + + strbuf_addstr(&path, git_exec_path()); + strbuf_addstr(&path, "/git-bugreport-config-whitelist"); + + gather_whitelist(&path); + strbuf_init(&configs_and_values, whitelist.nr); + + git_config(git_config_bugreport, NULL); + + strbuf_reset(config_info); + strbuf_addbuf(config_info, &configs_and_values); +} diff --git a/bugreport.h b/bugreport.h index ba216acf3f..7413e7e1be 100644 --- a/bugreport.h +++ b/bugreport.h @@ -5,3 +5,10 @@ * The previous contents of sys_info will be discarded. */ void get_system_info(struct strbuf *sys_info); + +/** + * Adds the values of the config items listed in + * 'git-bugreport-config-whitelist' to config_info. The previous contents of + * config_info will be discarded. + */ +void get_whitelisted_config(struct strbuf *sys_info); diff --git a/builtin/bugreport.c b/builtin/bugreport.c index 7232d31be7..70fe0d2b85 100644 --- a/builtin/bugreport.c +++ b/builtin/bugreport.c @@ -56,6 +56,10 @@ int cmd_bugreport(int argc, const char **argv, const char *prefix) get_system_info(&buffer); strbuf_write(&buffer, report); + add_header(report, "Whitelisted Config"); + get_whitelisted_config(&buffer); + strbuf_write(&buffer, report); + fclose(report); launch_editor(report_path.buf, NULL, NULL);
Teach bugreport to gather the values of config options which are present in 'git-bugreport-config-whitelist'. Many config options are sensitive, and many Git add-ons use config options which git-core does not know about; it is better only to gather config options which we know to be safe, rather than excluding options which we know to be unsafe. Reading the whitelist into memory and sorting it saves us time - since git_config_bugreport() is called for every option the user has configured, limiting the file IO to one open/read/close and performing option lookup in sublinear time is a useful optimization. Signed-off-by: Emily Shaffer <emilyshaffer@google.com> --- bugreport.c | 50 +++++++++++++++++++++++++++++++++++++++++++++ bugreport.h | 7 +++++++ builtin/bugreport.c | 4 ++++ 3 files changed, 61 insertions(+)