Message ID | pull.1744.v2.git.git.1721821503173.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [v2] Fix to avoid high memory footprint | expand |
"Haritha via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: D Harithamma <harithamma.d@ibm.com> > > This fix avoids high memory footprint when adding files that require > conversion. Git has a trace_encoding routine that prints trace output > when GIT_TRACE_WORKING_TREE_ENCODING=1 is set. This environment > variable is used to debug the encoding contents. When a 40MB file is > added, it requests close to 1.8GB of storage from xrealloc which can > lead to out of memory errors. However, the check for > GIT_TRACE_WORKING_TREE_ENCODING is done after the string is allocated. > This resolves high memory footprints even when > GIT_TRACE_WORKING_TREE_ENCODING is not active. This fix adds an early > exit to avoid the unnecessary memory allocation. The sentences jump around and the logic flow is hard to follow. The first sentence makes a claim of what it does (but the readers have not bee told where that problem comes from). The second sentence makes a statement of a fact, but the readers do not yet know at that point what relevance the fact has to the issue at hand, etc. The usual way to compose a log message of this project is to - Give an observation on how the current system work in the present tense (so no need to say "Currently X is Y", just "X is Y"), and discuss what you perceive as a problem in it. - Propose a solution (optional---often, problem description trivially leads to an obvious solution in reader's minds). - Give commands to the codebase to "become like so". in this order. When Git needs to add a file that require encoding conversion, but tracing of encoding conversion is *not* requested via setting GIT_TRACE_WORKING_TREE_ENCODING environment variable, the trace_encoding() function still allocated and prepared "human readable" copies of the file contents before and after conversion to show in the trace. This wasted a lot of memory footprint and runtime cycles without giving any user-visible benefit. Exit early from the function when we we are not tracing before we spend all the effort, not after. or something, perhaps? I am wondering if we should be able to test this, but "git grep GIT_TRACE_WORKING_TREE_ENCODING t/" is not finding any existing test in the area. > Signed-off-by: Harithamma D <harithamma.d@ibm.com> This does not match the "From: " line above. Please pick one way to spell your name and identify yourself to this project, and use it consistently. Thanks. > diff --git a/convert.c b/convert.c > index d8737fe0f2d..c4ddc4de81b 100644 > --- a/convert.c > +++ b/convert.c > @@ -324,6 +324,9 @@ static void trace_encoding(const char *context, const char *path, > struct strbuf trace = STRBUF_INIT; > int i; > > + if (!trace_want(&coe)) > + return; > + The actual fix is so simple and nice ;-)
On Wed, Jul 24, 2024 at 11:45:03AM +0000, Haritha via GitGitGadget wrote: > diff --git a/convert.c b/convert.c > index d8737fe0f2d..c4ddc4de81b 100644 > --- a/convert.c > +++ b/convert.c > @@ -324,6 +324,9 @@ static void trace_encoding(const char *context, const char *path, > struct strbuf trace = STRBUF_INIT; > int i; > > + if (!trace_want(&coe)) > + return; > + > strbuf_addf(&trace, "%s (%s, considered %s):\n", context, path, encoding); > for (i = 0; i < len && buf; ++i) { > strbuf_addf( The patch itself looks good. I confirmed that running: git init dd if=/dev/zero of=foo.bin bs=1M count=50 echo '*.bin working-tree-encoding=UTF-16LE' >.gitattributes valgrind --tool=massif git add . goes from a max heap of 1.7G down to 51MB with your patch (whereas I think with the previous iteration it would not have, since the old check did the wrong thing on the first call to trace_encoding()). -Peff
diff --git a/convert.c b/convert.c index d8737fe0f2d..c4ddc4de81b 100644 --- a/convert.c +++ b/convert.c @@ -324,6 +324,9 @@ static void trace_encoding(const char *context, const char *path, struct strbuf trace = STRBUF_INIT; int i; + if (!trace_want(&coe)) + return; + strbuf_addf(&trace, "%s (%s, considered %s):\n", context, path, encoding); for (i = 0; i < len && buf; ++i) { strbuf_addf(