Message ID | 20220920071317.1787-5-thunder.leizhen@huawei.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | kallsyms: Optimizes the performance of lookup symbols | expand |
On Tue 2022-09-20 15:13:13, Zhen Lei wrote: > Currently, to search for a symbol, we need to expand the symbols in > 'kallsyms_names' one by one, and then use the expanded string for > comparison. This process can be optimized. > > And now scripts/kallsyms no longer compresses the symbol types, each > symbol type always occupies one byte. So we can first compress the > searched symbol and then make a quick comparison based on the compressed > length and content. In this way, for entries with mismatched lengths, > there is no need to expand and compare strings. And for those matching > lengths, there's no need to expand the symbol. This saves a lot of time. > According to my test results, the average performance of > kallsyms_lookup_name() can be improved by 20 to 30 times. > > The pseudo code of the test case is as follows: > static int stat_find_name(...) > { > start = sched_clock(); > (void)kallsyms_lookup_name(name); > end = sched_clock(); > //Update min, max, cnt, sum > } > > /* > * Traverse all symbols in sequence and collect statistics on the time > * taken by kallsyms_lookup_name() to lookup each symbol. > */ > kallsyms_on_each_symbol(stat_find_name, NULL); > > The test results are as follows (twice): > After : min=5250, max= 726560, avg= 302132 > After : min=5320, max= 726850, avg= 301978 > Before: min=170, max=15949190, avg=7553906 > Before: min=160, max=15877280, avg=7517784 > > The average time consumed is only 4.01% and the maximum time consumed is > only 4.57% of the time consumed before optimization. > > Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> > --- > kernel/kallsyms.c | 79 +++++++++++++++++++++++++++++++++++++++++++++-- > 1 file changed, 76 insertions(+), 3 deletions(-) > > diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c > index 3e7e2c2ad2f75ef..2d76196cfe89f34 100644 > --- a/kernel/kallsyms.c > +++ b/kernel/kallsyms.c > @@ -87,6 +87,71 @@ static unsigned int kallsyms_expand_symbol(unsigned int off, > return off; > } > > +static int kallsyms_name_to_tokens(const char *name, char *buf) This is not safe API. It is always needed to pass the size of the buffer. Also it should be called "compress". "token" is just an implementation detail. I would do: static int kallsyms_compress_symbol_name(const char *name, char *buf, size_t size) > +{ > + int i, j, k, n; > + int len, token_len; > + const char *token; > + unsigned char token_idx[KSYM_NAME_LEN]; > + unsigned char token_bak[KSYM_NAME_LEN]; Why do we need two buffers? It should be possible to compress the name in the same buffer as it is done in compress_symbols() in scripts/callsyms.c. > + > + /* > + * n, number of tokens in the string name. > + * token_idx[i], the start index of the ith token. > + * token_idx[n] is used to calculate the length of the last token. > + */ > + n = strlen(name); > + if (n >= KSYM_NAME_LEN) { > + buf[0] = 0; > + return 0; > + } > + for (i = 0; i <= n; i++) > + token_idx[i] = (unsigned char)i; > + > + /* > + * For tokens whose token_len >= 2, a larger index value indicates > + * a higher occurrence frequency. See scripts/kallsyms.c > + */ > + for (i = 255; i >= 0; i--) { > + token = &kallsyms_token_table[kallsyms_token_index[i]]; > + token_len = strlen(token); > + if (token_len <= 1) > + continue; > + > + /* > + * Find and merge two tokens into one. > + * > + * |<-- new_token -->| > + * | token1 | token2 | > + * token_idx[]: j j+1 j+2 > + */ > + for (j = 0; j < n - 1; j++) { > + len = token_idx[j + 2] - token_idx[j]; > + if (len == token_len && > + !strncmp(name + token_idx[j], token, len)) { > + token_bak[token_idx[j]] = (unsigned char)i; > + for (k = j + 1; k < n; k++) > + token_idx[k] = token_idx[k + 1]; > + n--; > + } > + } > + } > + > + for (j = 0; j < n; j++) { > + len = token_idx[j + 1] - token_idx[j]; > + if (len <= 1) { > + buf[j] = name[token_idx[j]]; > + continue; > + } > + > + buf[j] = token_bak[token_idx[j]]; Maybe, I do not understand the compression format correctly but this code looks too complicated. Honestly, I even did not try to understand it. My understanding is the we just need to find all tokens and replace them with index. It should be even easier than compress_symbols() in scripts/callsyms.c. The token_table already exists and we do not need to handle the token_profit... The following looks more strigtforward (not even compile tested): ssize_t len, size; len = strscpy(buf, symname, size); if (WARN_ON_ONCE(len < 0)) return -EINVAL; /* the tokens with higher index are used first */ for (idx = 255; idx >= 0; idx--) { token = &kallsyms_token_table[kallsyms_token_index[i]]; token_len = strlen(token); p1 = buf; /* length of the remaining symname including the trailing '\0' */ remaining = len + 1; /* find the token in the symbol name */ p2 = strstr(token, p1); while (p2) { /* replace token with index */ *p2 = idx; remaining -= ((p2 - p1) + token_len); memmove(p2 + 1, p2 + token_len, remaining); len -= (token_len - 1); p1 = p2; /* find the token in the rest of the symbol name */ p2 = strstr(token, p1); } } return len; > + } > + buf[n] = 0; > + > + return n; > +} > + > /* > * Get symbol type information. This is encoded as a single char at the > * beginning of the symbol name. > @@ -192,20 +257,28 @@ unsigned long kallsyms_lookup_name(const char *name) > char namebuf[KSYM_NAME_LEN]; > unsigned long i; > unsigned int off; > + int len; > > /* Skip the search for empty string. */ > if (!*name) > return 0; > > + len = kallsyms_name_to_tokens(name, namebuf); > + for (i = 0, off = 0; len && i < kallsyms_num_syms; i++) { > + if (kallsyms_names[off] == len + 1 && > + !memcmp(&kallsyms_names[off + 2], namebuf, len)) > + return kallsyms_sym_address(i); > + > + off += kallsyms_names[off] + 1; These complicated checks are hard to review. The following looks much more readable to me: for (i = 0, off = 0; len && i < kallsyms_num_syms; i++) { /* the stored symbol name is prefixed by symbol type */ name_len = kallsyms_names[off] - 1; name = &kallsyms_names[off + 2]; off += name_len + 2; if (name_len != len) continue; if (!memcmp(name, namebuf, len)) return kallsyms_sym_address(i); } > + } > + > for (i = 0, off = 0; i < kallsyms_num_syms; i++) { > off = kallsyms_expand_symbol(off, namebuf, ARRAY_SIZE(namebuf)); > > - if (strcmp(namebuf, name) == 0) > - return kallsyms_sym_address(i); > - > if (cleanup_symbol_name(namebuf) && strcmp(namebuf, name) == 0) > return kallsyms_sym_address(i); Hmm, it means that the speedup is not usable when kernel is compiled LLVM? It might actually slow down the search because we would need to use both fast and slow search? Best Regards, Petr
On 2022/9/21 23:25, Petr Mladek wrote: > On Tue 2022-09-20 15:13:13, Zhen Lei wrote: >> Currently, to search for a symbol, we need to expand the symbols in >> 'kallsyms_names' one by one, and then use the expanded string for >> comparison. This process can be optimized. >> >> And now scripts/kallsyms no longer compresses the symbol types, each >> symbol type always occupies one byte. So we can first compress the >> searched symbol and then make a quick comparison based on the compressed >> length and content. In this way, for entries with mismatched lengths, >> there is no need to expand and compare strings. And for those matching >> lengths, there's no need to expand the symbol. This saves a lot of time. >> According to my test results, the average performance of >> kallsyms_lookup_name() can be improved by 20 to 30 times. >> >> The pseudo code of the test case is as follows: >> static int stat_find_name(...) >> { >> start = sched_clock(); >> (void)kallsyms_lookup_name(name); >> end = sched_clock(); >> //Update min, max, cnt, sum >> } >> >> /* >> * Traverse all symbols in sequence and collect statistics on the time >> * taken by kallsyms_lookup_name() to lookup each symbol. >> */ >> kallsyms_on_each_symbol(stat_find_name, NULL); >> >> The test results are as follows (twice): >> After : min=5250, max= 726560, avg= 302132 >> After : min=5320, max= 726850, avg= 301978 >> Before: min=170, max=15949190, avg=7553906 >> Before: min=160, max=15877280, avg=7517784 >> >> The average time consumed is only 4.01% and the maximum time consumed is >> only 4.57% of the time consumed before optimization. >> >> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> >> --- >> kernel/kallsyms.c | 79 +++++++++++++++++++++++++++++++++++++++++++++-- >> 1 file changed, 76 insertions(+), 3 deletions(-) >> >> diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c >> index 3e7e2c2ad2f75ef..2d76196cfe89f34 100644 >> --- a/kernel/kallsyms.c >> +++ b/kernel/kallsyms.c >> @@ -87,6 +87,71 @@ static unsigned int kallsyms_expand_symbol(unsigned int off, >> return off; >> } >> >> +static int kallsyms_name_to_tokens(const char *name, char *buf) > > This is not safe API. It is always needed to pass the size of the > buffer. OK > > Also it should be called "compress". "token" is just an implementation > detail. > > I would do: > > static int kallsyms_compress_symbol_name(const char *name, > char *buf, size_t size) This's a wonderful name. Thanks. > > >> +{ >> + int i, j, k, n; >> + int len, token_len; >> + const char *token; >> + unsigned char token_idx[KSYM_NAME_LEN]; >> + unsigned char token_bak[KSYM_NAME_LEN]; > > Why do we need two buffers? It should be possible to compress the name > in the same buffer as it is done in compress_symbols() in scripts/callsyms.c. Because the performance would be a little better. Now this function takes just over a microsecond. Currently, it takes about 250 microseconds on average to lookup a symbol, so adding a little more time to this function doesn't affect the overall picture. I'll modify and test it as you suggest below. > >> + >> + /* >> + * n, number of tokens in the string name. >> + * token_idx[i], the start index of the ith token. >> + * token_idx[n] is used to calculate the length of the last token. >> + */ >> + n = strlen(name); >> + if (n >= KSYM_NAME_LEN) { >> + buf[0] = 0; >> + return 0; >> + } >> + for (i = 0; i <= n; i++) >> + token_idx[i] = (unsigned char)i; >> + >> + /* >> + * For tokens whose token_len >= 2, a larger index value indicates >> + * a higher occurrence frequency. See scripts/kallsyms.c >> + */ >> + for (i = 255; i >= 0; i--) { >> + token = &kallsyms_token_table[kallsyms_token_index[i]]; >> + token_len = strlen(token); >> + if (token_len <= 1) >> + continue; >> + >> + /* >> + * Find and merge two tokens into one. >> + * >> + * |<-- new_token -->| >> + * | token1 | token2 | >> + * token_idx[]: j j+1 j+2 >> + */ >> + for (j = 0; j < n - 1; j++) { >> + len = token_idx[j + 2] - token_idx[j]; >> + if (len == token_len && >> + !strncmp(name + token_idx[j], token, len)) { >> + token_bak[token_idx[j]] = (unsigned char)i; >> + for (k = j + 1; k < n; k++) >> + token_idx[k] = token_idx[k + 1]; >> + n--; >> + } >> + } >> + } >> + >> + for (j = 0; j < n; j++) { >> + len = token_idx[j + 1] - token_idx[j]; >> + if (len <= 1) { >> + buf[j] = name[token_idx[j]]; >> + continue; >> + } >> + >> + buf[j] = token_bak[token_idx[j]]; > > Maybe, I do not understand the compression format correctly but > this code looks too complicated. Honestly, I even did not try to > understand it. > > My understanding is the we just need to find all tokens and > replace them with index. > > It should be even easier than compress_symbols() in scripts/callsyms.c. > The token_table already exists and we do not need to handle the token_profit... > > The following looks more strigtforward (not even compile tested): OK, I will try this one. Or refer to compress_symbols() in scripts/callsyms.c. > > ssize_t len, size; > > len = strscpy(buf, symname, size); > if (WARN_ON_ONCE(len < 0)) > return -EINVAL; > > /* the tokens with higher index are used first */ > for (idx = 255; idx >= 0; idx--) { > token = &kallsyms_token_table[kallsyms_token_index[i]]; > token_len = strlen(token); > > p1 = buf; > /* length of the remaining symname including the trailing '\0' */ > remaining = len + 1; > > /* find the token in the symbol name */ > p2 = strstr(token, p1); > > while (p2) { > /* replace token with index */ > *p2 = idx; > remaining -= ((p2 - p1) + token_len); > memmove(p2 + 1, p2 + token_len, remaining); > len -= (token_len - 1); > p1 = p2; > > /* find the token in the rest of the symbol name */ > p2 = strstr(token, p1); > } > } > > return len; > >> + } >> + buf[n] = 0; >> + >> + return n; >> +} >> + >> /* >> * Get symbol type information. This is encoded as a single char at the >> * beginning of the symbol name. >> @@ -192,20 +257,28 @@ unsigned long kallsyms_lookup_name(const char *name) >> char namebuf[KSYM_NAME_LEN]; >> unsigned long i; >> unsigned int off; >> + int len; >> >> /* Skip the search for empty string. */ >> if (!*name) >> return 0; >> >> + len = kallsyms_name_to_tokens(name, namebuf); >> + for (i = 0, off = 0; len && i < kallsyms_num_syms; i++) { >> + if (kallsyms_names[off] == len + 1 && >> + !memcmp(&kallsyms_names[off + 2], namebuf, len)) >> + return kallsyms_sym_address(i); >> + >> + off += kallsyms_names[off] + 1; > > These complicated checks are hard to review. The following looks much > more readable to me: Yes, it looks well. > > for (i = 0, off = 0; len && i < kallsyms_num_syms; i++) { > /* the stored symbol name is prefixed by symbol type */ > name_len = kallsyms_names[off] - 1; > name = &kallsyms_names[off + 2]; > off += name_len + 2; > > if (name_len != len) > continue; > > if (!memcmp(name, namebuf, len)) > return kallsyms_sym_address(i); > } > > >> + } >> + >> for (i = 0, off = 0; i < kallsyms_num_syms; i++) { >> off = kallsyms_expand_symbol(off, namebuf, ARRAY_SIZE(namebuf)); >> >> - if (strcmp(namebuf, name) == 0) >> - return kallsyms_sym_address(i); >> - >> if (cleanup_symbol_name(namebuf) && strcmp(namebuf, name) == 0) >> return kallsyms_sym_address(i); > > Hmm, it means that the speedup is not usable when kernel is compiled LLVM? > It might actually slow down the search because we would need to use > both fast and slow search? Theoretically, I don't think so. A string comparison was removed from the slow search. "if (name_len != len)" is faster than "if (strcmp(namebuf, name) == 0)". Even if they're equal, kallsyms_compress_symbol_name() only takes 1-2us, it doesn't affect the overall picture. The average lookup time before optimization is millisecond-level. To allay your concerns, I can run a test. Before: min=170, max=15949190, avg=7553906 > > Best Regards, > Petr > . >
On Thu 2022-09-22 10:15:22, Leizhen (ThunderTown) wrote: > > > On 2022/9/21 23:25, Petr Mladek wrote: > > On Tue 2022-09-20 15:13:13, Zhen Lei wrote: > >> Currently, to search for a symbol, we need to expand the symbols in > >> 'kallsyms_names' one by one, and then use the expanded string for > >> comparison. This process can be optimized. > >> > >> And now scripts/kallsyms no longer compresses the symbol types, each > >> symbol type always occupies one byte. So we can first compress the > >> searched symbol and then make a quick comparison based on the compressed > >> length and content. In this way, for entries with mismatched lengths, > >> there is no need to expand and compare strings. And for those matching > >> lengths, there's no need to expand the symbol. This saves a lot of time. > >> According to my test results, the average performance of > >> kallsyms_lookup_name() can be improved by 20 to 30 times. > >> > >> The pseudo code of the test case is as follows: > >> static int stat_find_name(...) > >> { > >> start = sched_clock(); > >> (void)kallsyms_lookup_name(name); > >> end = sched_clock(); > >> //Update min, max, cnt, sum > >> } > >> > >> /* > >> * Traverse all symbols in sequence and collect statistics on the time > >> * taken by kallsyms_lookup_name() to lookup each symbol. > >> */ > >> kallsyms_on_each_symbol(stat_find_name, NULL); > >> > >> The test results are as follows (twice): > >> After : min=5250, max= 726560, avg= 302132 > >> After : min=5320, max= 726850, avg= 301978 > >> Before: min=170, max=15949190, avg=7553906 > >> Before: min=160, max=15877280, avg=7517784 > >> > >> The average time consumed is only 4.01% and the maximum time consumed is > >> only 4.57% of the time consumed before optimization. > >> > >> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> > >> --- > >> kernel/kallsyms.c | 79 +++++++++++++++++++++++++++++++++++++++++++++-- > >> 1 file changed, 76 insertions(+), 3 deletions(-) > >> > >> diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c > >> index 3e7e2c2ad2f75ef..2d76196cfe89f34 100644 > >> --- a/kernel/kallsyms.c > >> +++ b/kernel/kallsyms.c > >> @@ -87,6 +87,71 @@ static unsigned int kallsyms_expand_symbol(unsigned int off, > >> +{ > >> + int i, j, k, n; > >> + int len, token_len; > >> + const char *token; > >> + unsigned char token_idx[KSYM_NAME_LEN]; > >> + unsigned char token_bak[KSYM_NAME_LEN]; > > > > Why do we need two buffers? It should be possible to compress the name > > in the same buffer as it is done in compress_symbols() in scripts/callsyms.c. > > Because the performance would be a little better. Now this function takes > just over a microsecond. Currently, it takes about 250 microseconds on > average to lookup a symbol, so adding a little more time to this function > doesn't affect the overall picture. I'll modify and test it as you suggest > below. We need to be careful about a stack overflow. I have seen that KSYM_NAME_LEN might need to be increased to 512 because of Rust support, see https://lore.kernel.org/r/20220805154231.31257-6-ojeda@kernel.org > >> @@ -192,20 +257,28 @@ unsigned long kallsyms_lookup_name(const char *name) > >> for (i = 0, off = 0; i < kallsyms_num_syms; i++) { > >> off = kallsyms_expand_symbol(off, namebuf, ARRAY_SIZE(namebuf)); > >> > >> - if (strcmp(namebuf, name) == 0) > >> - return kallsyms_sym_address(i); > >> - > >> if (cleanup_symbol_name(namebuf) && strcmp(namebuf, name) == 0) > >> return kallsyms_sym_address(i); > > > > Hmm, it means that the speedup is not usable when kernel is compiled LLVM? > > It might actually slow down the search because we would need to use > > both fast and slow search? > > Theoretically, I don't think so. A string comparison was removed from the > slow search. "if (name_len != len)" is faster than > "if (strcmp(namebuf, name) == 0)". Even if they're equal, > kallsyms_compress_symbol_name() only takes 1-2us, it doesn't affect the > overall picture. The average lookup time before optimization is > millisecond-level. > > Before: min=170, max=15949190, avg=7553906 Good point! I agree that the potential extra overhead is negligible when using the old code as a fallback. Best Regards, Petr
On 2022/9/22 10:15, Leizhen (ThunderTown) wrote: > > > On 2022/9/21 23:25, Petr Mladek wrote: >> On Tue 2022-09-20 15:13:13, Zhen Lei wrote: >>> Currently, to search for a symbol, we need to expand the symbols in >>> 'kallsyms_names' one by one, and then use the expanded string for >>> comparison. This process can be optimized. >>> >>> And now scripts/kallsyms no longer compresses the symbol types, each >>> symbol type always occupies one byte. So we can first compress the >>> searched symbol and then make a quick comparison based on the compressed >>> length and content. In this way, for entries with mismatched lengths, >>> there is no need to expand and compare strings. And for those matching >>> lengths, there's no need to expand the symbol. This saves a lot of time. >>> According to my test results, the average performance of >>> kallsyms_lookup_name() can be improved by 20 to 30 times. >>> >>> The pseudo code of the test case is as follows: >>> static int stat_find_name(...) >>> { >>> start = sched_clock(); >>> (void)kallsyms_lookup_name(name); >>> end = sched_clock(); >>> //Update min, max, cnt, sum >>> } >>> >>> /* >>> * Traverse all symbols in sequence and collect statistics on the time >>> * taken by kallsyms_lookup_name() to lookup each symbol. >>> */ >>> kallsyms_on_each_symbol(stat_find_name, NULL); >>> >>> The test results are as follows (twice): >>> After : min=5250, max= 726560, avg= 302132 >>> After : min=5320, max= 726850, avg= 301978 >>> Before: min=170, max=15949190, avg=7553906 >>> Before: min=160, max=15877280, avg=7517784 >>> >>> The average time consumed is only 4.01% and the maximum time consumed is >>> only 4.57% of the time consumed before optimization. >>> >>> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> >>> --- >>> kernel/kallsyms.c | 79 +++++++++++++++++++++++++++++++++++++++++++++-- >>> 1 file changed, 76 insertions(+), 3 deletions(-) >>> >>> diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c >>> index 3e7e2c2ad2f75ef..2d76196cfe89f34 100644 >>> --- a/kernel/kallsyms.c >>> +++ b/kernel/kallsyms.c >>> @@ -87,6 +87,71 @@ static unsigned int kallsyms_expand_symbol(unsigned int off, >>> return off; >>> } >>> >>> +static int kallsyms_name_to_tokens(const char *name, char *buf) >> >> This is not safe API. It is always needed to pass the size of the >> buffer. > > OK > >> >> Also it should be called "compress". "token" is just an implementation >> detail. >> >> I would do: >> >> static int kallsyms_compress_symbol_name(const char *name, >> char *buf, size_t size) > > This's a wonderful name. Thanks. > >> >> >>> +{ >>> + int i, j, k, n; >>> + int len, token_len; >>> + const char *token; >>> + unsigned char token_idx[KSYM_NAME_LEN]; >>> + unsigned char token_bak[KSYM_NAME_LEN]; >> >> Why do we need two buffers? It should be possible to compress the name >> in the same buffer as it is done in compress_symbols() in scripts/callsyms.c. > > Because the performance would be a little better. Now this function takes > just over a microsecond. Currently, it takes about 250 microseconds on > average to lookup a symbol, so adding a little more time to this function > doesn't affect the overall picture. I'll modify and test it as you suggest > below. > >> >>> + >>> + /* >>> + * n, number of tokens in the string name. >>> + * token_idx[i], the start index of the ith token. >>> + * token_idx[n] is used to calculate the length of the last token. >>> + */ >>> + n = strlen(name); >>> + if (n >= KSYM_NAME_LEN) { >>> + buf[0] = 0; >>> + return 0; >>> + } >>> + for (i = 0; i <= n; i++) >>> + token_idx[i] = (unsigned char)i; >>> + >>> + /* >>> + * For tokens whose token_len >= 2, a larger index value indicates >>> + * a higher occurrence frequency. See scripts/kallsyms.c >>> + */ >>> + for (i = 255; i >= 0; i--) { >>> + token = &kallsyms_token_table[kallsyms_token_index[i]]; >>> + token_len = strlen(token); >>> + if (token_len <= 1) >>> + continue; >>> + >>> + /* >>> + * Find and merge two tokens into one. >>> + * >>> + * |<-- new_token -->| >>> + * | token1 | token2 | >>> + * token_idx[]: j j+1 j+2 >>> + */ >>> + for (j = 0; j < n - 1; j++) { >>> + len = token_idx[j + 2] - token_idx[j]; >>> + if (len == token_len && >>> + !strncmp(name + token_idx[j], token, len)) { >>> + token_bak[token_idx[j]] = (unsigned char)i; >>> + for (k = j + 1; k < n; k++) >>> + token_idx[k] = token_idx[k + 1]; >>> + n--; >>> + } >>> + } >>> + } >>> + >>> + for (j = 0; j < n; j++) { >>> + len = token_idx[j + 1] - token_idx[j]; >>> + if (len <= 1) { >>> + buf[j] = name[token_idx[j]]; >>> + continue; >>> + } >>> + >>> + buf[j] = token_bak[token_idx[j]]; >> >> Maybe, I do not understand the compression format correctly but >> this code looks too complicated. Honestly, I even did not try to >> understand it. >> >> My understanding is the we just need to find all tokens and >> replace them with index. >> >> It should be even easier than compress_symbols() in scripts/callsyms.c. >> The token_table already exists and we do not need to handle the token_profit... >> >> The following looks more strigtforward (not even compile tested): > > OK, I will try this one. Or refer to compress_symbols() in scripts/callsyms.c. This method won't work. Because the tokens in kallsyms_token_table[] have been expanded. For example: name = "nfs_fs_proc_net_init" kallsyms_token_table[0xf3] = "s_", raw token = 73 5f kallsyms_token_table[0x9f] = "fs_", raw token = 66 f3 After "s_" have been replaced with f3, there is no "fs_" substring in namebuf. That's why I wrote a new piece of code. Due to I didn't want to add a variable like kallsyms_token_table[]. Now, I will add kallsyms_best_token_table[], kallsyms_best_token_table_len; kallsyms_best_token_table[] does not store tokens that contain only one character. And index=0 is the token with the highest frequency. > >> >> ssize_t len, size; >> >> len = strscpy(buf, symname, size); >> if (WARN_ON_ONCE(len < 0)) >> return -EINVAL; >> >> /* the tokens with higher index are used first */ >> for (idx = 255; idx >= 0; idx--) { >> token = &kallsyms_token_table[kallsyms_token_index[i]]; >> token_len = strlen(token); >> >> p1 = buf; >> /* length of the remaining symname including the trailing '\0' */ >> remaining = len + 1; >> >> /* find the token in the symbol name */ >> p2 = strstr(token, p1); >> >> while (p2) { >> /* replace token with index */ >> *p2 = idx; >> remaining -= ((p2 - p1) + token_len); >> memmove(p2 + 1, p2 + token_len, remaining); >> len -= (token_len - 1); >> p1 = p2; >> >> /* find the token in the rest of the symbol name */ >> p2 = strstr(token, p1); >> } >> } >> >> return len; >> >>> + } >>> + buf[n] = 0; >>> + >>> + return n; >>> +} >>> + >>> /* >>> * Get symbol type information. This is encoded as a single char at the >>> * beginning of the symbol name. >>> @@ -192,20 +257,28 @@ unsigned long kallsyms_lookup_name(const char *name) >>> char namebuf[KSYM_NAME_LEN]; >>> unsigned long i; >>> unsigned int off; >>> + int len; >>> >>> /* Skip the search for empty string. */ >>> if (!*name) >>> return 0; >>> >>> + len = kallsyms_name_to_tokens(name, namebuf); >>> + for (i = 0, off = 0; len && i < kallsyms_num_syms; i++) { >>> + if (kallsyms_names[off] == len + 1 && >>> + !memcmp(&kallsyms_names[off + 2], namebuf, len)) >>> + return kallsyms_sym_address(i); >>> + >>> + off += kallsyms_names[off] + 1; >> >> These complicated checks are hard to review. The following looks much >> more readable to me: > > Yes, it looks well. > >> >> for (i = 0, off = 0; len && i < kallsyms_num_syms; i++) { >> /* the stored symbol name is prefixed by symbol type */ >> name_len = kallsyms_names[off] - 1; >> name = &kallsyms_names[off + 2]; >> off += name_len + 2; >> >> if (name_len != len) >> continue; >> >> if (!memcmp(name, namebuf, len)) >> return kallsyms_sym_address(i); >> } >> >> >>> + } >>> + >>> for (i = 0, off = 0; i < kallsyms_num_syms; i++) { >>> off = kallsyms_expand_symbol(off, namebuf, ARRAY_SIZE(namebuf)); >>> >>> - if (strcmp(namebuf, name) == 0) >>> - return kallsyms_sym_address(i); >>> - >>> if (cleanup_symbol_name(namebuf) && strcmp(namebuf, name) == 0) >>> return kallsyms_sym_address(i); >> >> Hmm, it means that the speedup is not usable when kernel is compiled LLVM? >> It might actually slow down the search because we would need to use >> both fast and slow search? > > Theoretically, I don't think so. A string comparison was removed from the > slow search. "if (name_len != len)" is faster than > "if (strcmp(namebuf, name) == 0)". Even if they're equal, > kallsyms_compress_symbol_name() only takes 1-2us, it doesn't affect the > overall picture. The average lookup time before optimization is > millisecond-level. To allay your concerns, I can run a test. > > Before: min=170, max=15949190, avg=7553906 > >> >> Best Regards, >> Petr >> . >> >
On 2022/9/22 15:02, Petr Mladek wrote: > On Thu 2022-09-22 10:15:22, Leizhen (ThunderTown) wrote: >> >> >> On 2022/9/21 23:25, Petr Mladek wrote: >>> On Tue 2022-09-20 15:13:13, Zhen Lei wrote: >>>> Currently, to search for a symbol, we need to expand the symbols in >>>> 'kallsyms_names' one by one, and then use the expanded string for >>>> comparison. This process can be optimized. >>>> >>>> And now scripts/kallsyms no longer compresses the symbol types, each >>>> symbol type always occupies one byte. So we can first compress the >>>> searched symbol and then make a quick comparison based on the compressed >>>> length and content. In this way, for entries with mismatched lengths, >>>> there is no need to expand and compare strings. And for those matching >>>> lengths, there's no need to expand the symbol. This saves a lot of time. >>>> According to my test results, the average performance of >>>> kallsyms_lookup_name() can be improved by 20 to 30 times. >>>> >>>> The pseudo code of the test case is as follows: >>>> static int stat_find_name(...) >>>> { >>>> start = sched_clock(); >>>> (void)kallsyms_lookup_name(name); >>>> end = sched_clock(); >>>> //Update min, max, cnt, sum >>>> } >>>> >>>> /* >>>> * Traverse all symbols in sequence and collect statistics on the time >>>> * taken by kallsyms_lookup_name() to lookup each symbol. >>>> */ >>>> kallsyms_on_each_symbol(stat_find_name, NULL); >>>> >>>> The test results are as follows (twice): >>>> After : min=5250, max= 726560, avg= 302132 >>>> After : min=5320, max= 726850, avg= 301978 >>>> Before: min=170, max=15949190, avg=7553906 >>>> Before: min=160, max=15877280, avg=7517784 >>>> >>>> The average time consumed is only 4.01% and the maximum time consumed is >>>> only 4.57% of the time consumed before optimization. >>>> >>>> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> >>>> --- >>>> kernel/kallsyms.c | 79 +++++++++++++++++++++++++++++++++++++++++++++-- >>>> 1 file changed, 76 insertions(+), 3 deletions(-) >>>> >>>> diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c >>>> index 3e7e2c2ad2f75ef..2d76196cfe89f34 100644 >>>> --- a/kernel/kallsyms.c >>>> +++ b/kernel/kallsyms.c >>>> @@ -87,6 +87,71 @@ static unsigned int kallsyms_expand_symbol(unsigned int off, >>>> +{ >>>> + int i, j, k, n; >>>> + int len, token_len; >>>> + const char *token; >>>> + unsigned char token_idx[KSYM_NAME_LEN]; >>>> + unsigned char token_bak[KSYM_NAME_LEN]; >>> >>> Why do we need two buffers? It should be possible to compress the name >>> in the same buffer as it is done in compress_symbols() in scripts/callsyms.c. >> >> Because the performance would be a little better. Now this function takes >> just over a microsecond. Currently, it takes about 250 microseconds on >> average to lookup a symbol, so adding a little more time to this function >> doesn't affect the overall picture. I'll modify and test it as you suggest >> below. > > We need to be careful about a stack overflow. I have seen that > KSYM_NAME_LEN might need to be increased to 512 because of > Rust support, see > https://lore.kernel.org/r/20220805154231.31257-6-ojeda@kernel.org OK. Thanks for your information. I decided to add kallsyms_best_token_table[], kallsyms_best_token_table_len, so that we only need one namebuf[], like kallsyms_expand_symbol(). > >>>> @@ -192,20 +257,28 @@ unsigned long kallsyms_lookup_name(const char *name) >>>> for (i = 0, off = 0; i < kallsyms_num_syms; i++) { >>>> off = kallsyms_expand_symbol(off, namebuf, ARRAY_SIZE(namebuf)); >>>> >>>> - if (strcmp(namebuf, name) == 0) >>>> - return kallsyms_sym_address(i); >>>> - >>>> if (cleanup_symbol_name(namebuf) && strcmp(namebuf, name) == 0) >>>> return kallsyms_sym_address(i); >>> >>> Hmm, it means that the speedup is not usable when kernel is compiled LLVM? >>> It might actually slow down the search because we would need to use >>> both fast and slow search? >> >> Theoretically, I don't think so. A string comparison was removed from the >> slow search. "if (name_len != len)" is faster than >> "if (strcmp(namebuf, name) == 0)". Even if they're equal, >> kallsyms_compress_symbol_name() only takes 1-2us, it doesn't affect the >> overall picture. The average lookup time before optimization is >> millisecond-level. >> >> Before: min=170, max=15949190, avg=7553906 > > Good point! I agree that the potential extra overhead is negligible > when using the old code as a fallback. > > Best Regards, > Petr > . >
On Thu 2022-09-22 15:21:57, Leizhen (ThunderTown) wrote: > > > On 2022/9/22 15:02, Petr Mladek wrote: > > On Thu 2022-09-22 10:15:22, Leizhen (ThunderTown) wrote: > >> > >> > >> On 2022/9/21 23:25, Petr Mladek wrote: > >>> On Tue 2022-09-20 15:13:13, Zhen Lei wrote: > >>>> Currently, to search for a symbol, we need to expand the symbols in > >>>> 'kallsyms_names' one by one, and then use the expanded string for > >>>> comparison. This process can be optimized. > >>>> > >>>> And now scripts/kallsyms no longer compresses the symbol types, each > >>>> symbol type always occupies one byte. So we can first compress the > >>>> searched symbol and then make a quick comparison based on the compressed > >>>> length and content. In this way, for entries with mismatched lengths, > >>>> there is no need to expand and compare strings. And for those matching > >>>> lengths, there's no need to expand the symbol. This saves a lot of time. > >>>> According to my test results, the average performance of > >>>> kallsyms_lookup_name() can be improved by 20 to 30 times. > >>>> > >>>> The pseudo code of the test case is as follows: > >>>> static int stat_find_name(...) > >>>> { > >>>> start = sched_clock(); > >>>> (void)kallsyms_lookup_name(name); > >>>> end = sched_clock(); > >>>> //Update min, max, cnt, sum > >>>> } > >>>> > >>>> /* > >>>> * Traverse all symbols in sequence and collect statistics on the time > >>>> * taken by kallsyms_lookup_name() to lookup each symbol. > >>>> */ > >>>> kallsyms_on_each_symbol(stat_find_name, NULL); > >>>> > >>>> The test results are as follows (twice): > >>>> After : min=5250, max= 726560, avg= 302132 > >>>> After : min=5320, max= 726850, avg= 301978 > >>>> Before: min=170, max=15949190, avg=7553906 > >>>> Before: min=160, max=15877280, avg=7517784 > >>>> > >>>> The average time consumed is only 4.01% and the maximum time consumed is > >>>> only 4.57% of the time consumed before optimization. > >>>> > >>>> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> > >>>> --- > >>>> kernel/kallsyms.c | 79 +++++++++++++++++++++++++++++++++++++++++++++-- > >>>> 1 file changed, 76 insertions(+), 3 deletions(-) > >>>> > >>>> diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c > >>>> index 3e7e2c2ad2f75ef..2d76196cfe89f34 100644 > >>>> --- a/kernel/kallsyms.c > >>>> +++ b/kernel/kallsyms.c > >>>> @@ -87,6 +87,71 @@ static unsigned int kallsyms_expand_symbol(unsigned int off, > >>>> +{ > >>>> + int i, j, k, n; > >>>> + int len, token_len; > >>>> + const char *token; > >>>> + unsigned char token_idx[KSYM_NAME_LEN]; > >>>> + unsigned char token_bak[KSYM_NAME_LEN]; > >>> > >>> Why do we need two buffers? It should be possible to compress the name > >>> in the same buffer as it is done in compress_symbols() in scripts/callsyms.c. > >> > >> Because the performance would be a little better. Now this function takes > >> just over a microsecond. Currently, it takes about 250 microseconds on > >> average to lookup a symbol, so adding a little more time to this function > >> doesn't affect the overall picture. I'll modify and test it as you suggest > >> below. > > > > We need to be careful about a stack overflow. I have seen that > > KSYM_NAME_LEN might need to be increased to 512 because of > > Rust support, see > > https://lore.kernel.org/r/20220805154231.31257-6-ojeda@kernel.org > > OK. Thanks for your information. I decided to add kallsyms_best_token_table[], > kallsyms_best_token_table_len, so that we only need one namebuf[], like > kallsyms_expand_symbol(). Thanks for the effort. Adding kallsyms_best_token_table[] sounds like the right solution. Best Regards, Petr
On 2022/9/22 15:02, Petr Mladek wrote: > On Thu 2022-09-22 10:15:22, Leizhen (ThunderTown) wrote: >> >> >> On 2022/9/21 23:25, Petr Mladek wrote: >>> On Tue 2022-09-20 15:13:13, Zhen Lei wrote: >>>> Currently, to search for a symbol, we need to expand the symbols in >>>> 'kallsyms_names' one by one, and then use the expanded string for >>>> comparison. This process can be optimized. >>>> >>>> And now scripts/kallsyms no longer compresses the symbol types, each >>>> symbol type always occupies one byte. So we can first compress the >>>> searched symbol and then make a quick comparison based on the compressed >>>> length and content. In this way, for entries with mismatched lengths, >>>> there is no need to expand and compare strings. And for those matching >>>> lengths, there's no need to expand the symbol. This saves a lot of time. >>>> According to my test results, the average performance of >>>> kallsyms_lookup_name() can be improved by 20 to 30 times. >>>> >>>> The pseudo code of the test case is as follows: >>>> static int stat_find_name(...) >>>> { >>>> start = sched_clock(); >>>> (void)kallsyms_lookup_name(name); >>>> end = sched_clock(); >>>> //Update min, max, cnt, sum >>>> } >>>> >>>> /* >>>> * Traverse all symbols in sequence and collect statistics on the time >>>> * taken by kallsyms_lookup_name() to lookup each symbol. >>>> */ >>>> kallsyms_on_each_symbol(stat_find_name, NULL); >>>> >>>> The test results are as follows (twice): >>>> After : min=5250, max= 726560, avg= 302132 >>>> After : min=5320, max= 726850, avg= 301978 >>>> Before: min=170, max=15949190, avg=7553906 >>>> Before: min=160, max=15877280, avg=7517784 >>>> >>>> The average time consumed is only 4.01% and the maximum time consumed is >>>> only 4.57% of the time consumed before optimization. >>>> >>>> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> >>>> --- >>>> kernel/kallsyms.c | 79 +++++++++++++++++++++++++++++++++++++++++++++-- >>>> 1 file changed, 76 insertions(+), 3 deletions(-) >>>> >>>> diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c >>>> index 3e7e2c2ad2f75ef..2d76196cfe89f34 100644 >>>> --- a/kernel/kallsyms.c >>>> +++ b/kernel/kallsyms.c >>>> @@ -87,6 +87,71 @@ static unsigned int kallsyms_expand_symbol(unsigned int off, >>>> +{ >>>> + int i, j, k, n; >>>> + int len, token_len; >>>> + const char *token; >>>> + unsigned char token_idx[KSYM_NAME_LEN]; >>>> + unsigned char token_bak[KSYM_NAME_LEN]; >>> >>> Why do we need two buffers? It should be possible to compress the name >>> in the same buffer as it is done in compress_symbols() in scripts/callsyms.c. >> >> Because the performance would be a little better. Now this function takes >> just over a microsecond. Currently, it takes about 250 microseconds on >> average to lookup a symbol, so adding a little more time to this function >> doesn't affect the overall picture. I'll modify and test it as you suggest >> below. > > We need to be careful about a stack overflow. I have seen that > KSYM_NAME_LEN might need to be increased to 512 because of > Rust support, see > https://lore.kernel.org/r/20220805154231.31257-6-ojeda@kernel.org > >>>> @@ -192,20 +257,28 @@ unsigned long kallsyms_lookup_name(const char *name) >>>> for (i = 0, off = 0; i < kallsyms_num_syms; i++) { >>>> off = kallsyms_expand_symbol(off, namebuf, ARRAY_SIZE(namebuf)); >>>> >>>> - if (strcmp(namebuf, name) == 0) >>>> - return kallsyms_sym_address(i); >>>> - >>>> if (cleanup_symbol_name(namebuf) && strcmp(namebuf, name) == 0) >>>> return kallsyms_sym_address(i); >>> >>> Hmm, it means that the speedup is not usable when kernel is compiled LLVM? >>> It might actually slow down the search because we would need to use >>> both fast and slow search? >> >> Theoretically, I don't think so. A string comparison was removed from the >> slow search. "if (name_len != len)" is faster than >> "if (strcmp(namebuf, name) == 0)". Even if they're equal, >> kallsyms_compress_symbol_name() only takes 1-2us, it doesn't affect the >> overall picture. The average lookup time before optimization is >> millisecond-level. >> >> Before: min=170, max=15949190, avg=7553906 > > Good point! I agree that the potential extra overhead is negligible > when using the old code as a fallback. These days sleep better. When I got up this morning, my subconscious told me that compiled LLVM could also be optimized. In fact, the method is simple, that is, check whether the next token starts with '.' or '$' after being expanded. I will post v7 before the holidays. > > Best Regards, > Petr > . >
On 2022/9/28 9:35, Leizhen (ThunderTown) wrote: > > > On 2022/9/22 15:02, Petr Mladek wrote: >> On Thu 2022-09-22 10:15:22, Leizhen (ThunderTown) wrote: >>> >>> >>> On 2022/9/21 23:25, Petr Mladek wrote: >>>> On Tue 2022-09-20 15:13:13, Zhen Lei wrote: >>>>> Currently, to search for a symbol, we need to expand the symbols in >>>>> 'kallsyms_names' one by one, and then use the expanded string for >>>>> comparison. This process can be optimized. >>>>> >>>>> And now scripts/kallsyms no longer compresses the symbol types, each >>>>> symbol type always occupies one byte. So we can first compress the >>>>> searched symbol and then make a quick comparison based on the compressed >>>>> length and content. In this way, for entries with mismatched lengths, >>>>> there is no need to expand and compare strings. And for those matching >>>>> lengths, there's no need to expand the symbol. This saves a lot of time. >>>>> According to my test results, the average performance of >>>>> kallsyms_lookup_name() can be improved by 20 to 30 times. >>>>> >>>>> The pseudo code of the test case is as follows: >>>>> static int stat_find_name(...) >>>>> { >>>>> start = sched_clock(); >>>>> (void)kallsyms_lookup_name(name); >>>>> end = sched_clock(); >>>>> //Update min, max, cnt, sum >>>>> } >>>>> >>>>> /* >>>>> * Traverse all symbols in sequence and collect statistics on the time >>>>> * taken by kallsyms_lookup_name() to lookup each symbol. >>>>> */ >>>>> kallsyms_on_each_symbol(stat_find_name, NULL); >>>>> >>>>> The test results are as follows (twice): >>>>> After : min=5250, max= 726560, avg= 302132 >>>>> After : min=5320, max= 726850, avg= 301978 >>>>> Before: min=170, max=15949190, avg=7553906 >>>>> Before: min=160, max=15877280, avg=7517784 >>>>> >>>>> The average time consumed is only 4.01% and the maximum time consumed is >>>>> only 4.57% of the time consumed before optimization. >>>>> >>>>> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> >>>>> --- >>>>> kernel/kallsyms.c | 79 +++++++++++++++++++++++++++++++++++++++++++++-- >>>>> 1 file changed, 76 insertions(+), 3 deletions(-) >>>>> >>>>> diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c >>>>> index 3e7e2c2ad2f75ef..2d76196cfe89f34 100644 >>>>> --- a/kernel/kallsyms.c >>>>> +++ b/kernel/kallsyms.c >>>>> @@ -87,6 +87,71 @@ static unsigned int kallsyms_expand_symbol(unsigned int off, >>>>> +{ >>>>> + int i, j, k, n; >>>>> + int len, token_len; >>>>> + const char *token; >>>>> + unsigned char token_idx[KSYM_NAME_LEN]; >>>>> + unsigned char token_bak[KSYM_NAME_LEN]; >>>> >>>> Why do we need two buffers? It should be possible to compress the name >>>> in the same buffer as it is done in compress_symbols() in scripts/callsyms.c. >>> >>> Because the performance would be a little better. Now this function takes >>> just over a microsecond. Currently, it takes about 250 microseconds on >>> average to lookup a symbol, so adding a little more time to this function >>> doesn't affect the overall picture. I'll modify and test it as you suggest >>> below. >> >> We need to be careful about a stack overflow. I have seen that >> KSYM_NAME_LEN might need to be increased to 512 because of >> Rust support, see >> https://lore.kernel.org/r/20220805154231.31257-6-ojeda@kernel.org >> >>>>> @@ -192,20 +257,28 @@ unsigned long kallsyms_lookup_name(const char *name) >>>>> for (i = 0, off = 0; i < kallsyms_num_syms; i++) { >>>>> off = kallsyms_expand_symbol(off, namebuf, ARRAY_SIZE(namebuf)); >>>>> >>>>> - if (strcmp(namebuf, name) == 0) >>>>> - return kallsyms_sym_address(i); >>>>> - >>>>> if (cleanup_symbol_name(namebuf) && strcmp(namebuf, name) == 0) >>>>> return kallsyms_sym_address(i); >>>> >>>> Hmm, it means that the speedup is not usable when kernel is compiled LLVM? >>>> It might actually slow down the search because we would need to use >>>> both fast and slow search? >>> >>> Theoretically, I don't think so. A string comparison was removed from the >>> slow search. "if (name_len != len)" is faster than >>> "if (strcmp(namebuf, name) == 0)". Even if they're equal, >>> kallsyms_compress_symbol_name() only takes 1-2us, it doesn't affect the >>> overall picture. The average lookup time before optimization is >>> millisecond-level. >>> >>> Before: min=170, max=15949190, avg=7553906 >> >> Good point! I agree that the potential extra overhead is negligible >> when using the old code as a fallback. > > These days sleep better. When I got up this morning, my subconscious told me that > compiled LLVM could also be optimized. In fact, the method is simple, that is, > check whether the next token starts with '.' or '$' after being expanded. > > I will post v7 before the holidays. Sorry, I'm going to break my promise. A lot of code needs to be modified on the tool side, to make sure that the first '.' or '$' will not be in the middle or end of the expanded substring. The lab is powered off, so I can only post v7 after the holidays (one week). > >> >> Best Regards, >> Petr >> . >> >
diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c index 3e7e2c2ad2f75ef..2d76196cfe89f34 100644 --- a/kernel/kallsyms.c +++ b/kernel/kallsyms.c @@ -87,6 +87,71 @@ static unsigned int kallsyms_expand_symbol(unsigned int off, return off; } +static int kallsyms_name_to_tokens(const char *name, char *buf) +{ + int i, j, k, n; + int len, token_len; + const char *token; + unsigned char token_idx[KSYM_NAME_LEN]; + unsigned char token_bak[KSYM_NAME_LEN]; + + /* + * n, number of tokens in the string name. + * token_idx[i], the start index of the ith token. + * token_idx[n] is used to calculate the length of the last token. + */ + n = strlen(name); + if (n >= KSYM_NAME_LEN) { + buf[0] = 0; + return 0; + } + for (i = 0; i <= n; i++) + token_idx[i] = (unsigned char)i; + + /* + * For tokens whose token_len >= 2, a larger index value indicates + * a higher occurrence frequency. See scripts/kallsyms.c + */ + for (i = 255; i >= 0; i--) { + token = &kallsyms_token_table[kallsyms_token_index[i]]; + token_len = strlen(token); + if (token_len <= 1) + continue; + + /* + * Find and merge two tokens into one. + * + * |<-- new_token -->| + * | token1 | token2 | + * token_idx[]: j j+1 j+2 + * + */ + for (j = 0; j < n - 1; j++) { + len = token_idx[j + 2] - token_idx[j]; + if (len == token_len && + !strncmp(name + token_idx[j], token, len)) { + token_bak[token_idx[j]] = (unsigned char)i; + for (k = j + 1; k < n; k++) + token_idx[k] = token_idx[k + 1]; + n--; + } + } + } + + for (j = 0; j < n; j++) { + len = token_idx[j + 1] - token_idx[j]; + if (len <= 1) { + buf[j] = name[token_idx[j]]; + continue; + } + + buf[j] = token_bak[token_idx[j]]; + } + buf[n] = 0; + + return n; +} + /* * Get symbol type information. This is encoded as a single char at the * beginning of the symbol name. @@ -192,20 +257,28 @@ unsigned long kallsyms_lookup_name(const char *name) char namebuf[KSYM_NAME_LEN]; unsigned long i; unsigned int off; + int len; /* Skip the search for empty string. */ if (!*name) return 0; + len = kallsyms_name_to_tokens(name, namebuf); + for (i = 0, off = 0; len && i < kallsyms_num_syms; i++) { + if (kallsyms_names[off] == len + 1 && + !memcmp(&kallsyms_names[off + 2], namebuf, len)) + return kallsyms_sym_address(i); + + off += kallsyms_names[off] + 1; + } + for (i = 0, off = 0; i < kallsyms_num_syms; i++) { off = kallsyms_expand_symbol(off, namebuf, ARRAY_SIZE(namebuf)); - if (strcmp(namebuf, name) == 0) - return kallsyms_sym_address(i); - if (cleanup_symbol_name(namebuf) && strcmp(namebuf, name) == 0) return kallsyms_sym_address(i); } + return module_kallsyms_lookup_name(name); }
Currently, to search for a symbol, we need to expand the symbols in 'kallsyms_names' one by one, and then use the expanded string for comparison. This process can be optimized. And now scripts/kallsyms no longer compresses the symbol types, each symbol type always occupies one byte. So we can first compress the searched symbol and then make a quick comparison based on the compressed length and content. In this way, for entries with mismatched lengths, there is no need to expand and compare strings. And for those matching lengths, there's no need to expand the symbol. This saves a lot of time. According to my test results, the average performance of kallsyms_lookup_name() can be improved by 20 to 30 times. The pseudo code of the test case is as follows: static int stat_find_name(...) { start = sched_clock(); (void)kallsyms_lookup_name(name); end = sched_clock(); //Update min, max, cnt, sum } /* * Traverse all symbols in sequence and collect statistics on the time * taken by kallsyms_lookup_name() to lookup each symbol. */ kallsyms_on_each_symbol(stat_find_name, NULL); The test results are as follows (twice): After : min=5250, max= 726560, avg= 302132 After : min=5320, max= 726850, avg= 301978 Before: min=170, max=15949190, avg=7553906 Before: min=160, max=15877280, avg=7517784 The average time consumed is only 4.01% and the maximum time consumed is only 4.57% of the time consumed before optimization. Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> --- kernel/kallsyms.c | 79 +++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 76 insertions(+), 3 deletions(-)