[RFC,bpf-next,v2,0/3] bpf: support string key in htab

Message ID	20220214111337.3539-1-houtao1@huawei.com (mailing list archive)
Headers	show Return-Path: <netdev-owner@kernel.org> From: Hou Tao <houtao1@huawei.com> To: Alexei Starovoitov <ast@kernel.org> CC: Martin KaFai Lau <kafai@fb.com>, Yonghong Song <yhs@fb.com>, Daniel Borkmann <daniel@iogearbox.net>, Andrii Nakryiko <andrii@kernel.org>, Song Liu <songliubraving@fb.com>, John Fastabend <john.fastabend@gmail.com>, <netdev@vger.kernel.org>, <bpf@vger.kernel.org>, <houtao1@huawei.com> Subject: [RFC PATCH bpf-next v2 0/3] bpf: support string key in htab Date: Mon, 14 Feb 2022 19:13:34 +0800 Message-ID: <20220214111337.3539-1-houtao1@huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit Precedence: bulk
Series	bpf: support string key in htab \| expand [RFC,bpf-next,v2,0/3] bpf: support string key in htab [RFC,bpf-next,v2,1/3] bpf: add support for string in hash table key [RFC,bpf-next,v2,2/3] selftests/bpf: add a simple test for htab str key [RFC,bpf-next,v2,3/3] selftests/bpf: add benchmark for string-key hash table

Hou Tao Feb. 14, 2022, 11:13 a.m. UTC

Hi,

In order to use string as hash-table key, key_size must be the storage
size of longest string. If there are large differencies in string
length, the hash distribution will be sub-optimal due to the unused
zero bytes in shorter strings and the lookup will be inefficient due to
unnecessary memcmp().

Also it is possible the unused part of string key returned from bpf helper
(e.g. bpf_d_path) is not mem-zeroed and if using it directly as lookup key,
the lookup will fail with -ENOENT (as reported in [1]).

The patchset tries to address the inefficiency by adding support for
string key. There is extensibility problem in v1 because the string key
and its optimization is only available for string-only key. To make it
extensible, v2 introduces bpf_str_key_stor and bpf_str_key_desc and enforce
the layout of hash key struct through BTF as follows:

	>the start of hash key
	...
	[struct bpf_str_key_desc m;]
	...
	[struct bpf_str_key_desc n;]
	...
	struct bpf_str_key_stor z;
	unsigned char raw[N];
	>the end of hash key

So if there is string-only key, the struct of hash key will be:

	struct key {
		struct bpf_str_key_stor comm;
		unsigend char raw[128];
	};

And if there are other fields in hash key, the struct will be:

	struct key {
		int pid;
		struct bpf_str_key_stor comm;
		unsigned char raw[128];
	};

If there are multiple string in hash, the struct will become as:

	struct key {
		int pid;
		struct bpf_str_key_desc path;
		struct bpf_str_key_desc comm;
		unsigned char raw[128 + 128];
	};

See patch #1 and #3 for more details on how these key are manipulated and
used. Patch #2 adds a simple test to demonstrate how string key solves the
reported problem ([1]) due to unused part in hash key.

There are about 180% and 170% improvment in benchmark under x86-64 and
arm64 when key_size is 252. About 280% and %270 when key size is greater
than 512.

Also testing the performance improvment by using all files under linux
kernel sources as the string key input. There are about 74k files and the
maximum string length is 101. When key_size is 108, there are about 71%
and 39% win under x86-64 and arm64 in lookup performance, and when key_size
is 252, the win increases to 150% and 94% respectively.

The patchset is still in early stage of development, so any comments and
suggestions are always welcome.

Regards,
Tao

Change Log
v2:
  * make string key being extensible for no-string-only hash key

v1: https://lore.kernel.org/bpf/20211219052245.791605-1-houtao1@huawei.com/

[1]: https://lore.kernel.org/bpf/20211120051839.28212-2-yunbo.xufeng@linux.alibaba.com

Hou Tao (3):
  bpf: add support for string in hash table key
  selftests/bpf: add a simple test for htab str key
  selftests/bpf: add benchmark for string-key hash table

 include/linux/btf.h                           |   3 +
 include/uapi/linux/bpf.h                      |  19 +
 kernel/bpf/btf.c                              |  39 ++
 kernel/bpf/hashtab.c                          | 162 ++++++-
 tools/include/uapi/linux/bpf.h                |  19 +
 tools/testing/selftests/bpf/Makefile          |   4 +-
 tools/testing/selftests/bpf/bench.c           |  14 +
 .../selftests/bpf/benchs/bench_str_htab.c     | 449 ++++++++++++++++++
 .../testing/selftests/bpf/benchs/run_htab.sh  |  11 +
 .../selftests/bpf/prog_tests/str_key.c        |  71 +++
 .../selftests/bpf/progs/str_htab_bench.c      | 224 +++++++++
 tools/testing/selftests/bpf/progs/str_key.c   |  75 +++
 12 files changed, 1064 insertions(+), 26 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/benchs/bench_str_htab.c
 create mode 100755 tools/testing/selftests/bpf/benchs/run_htab.sh
 create mode 100644 tools/testing/selftests/bpf/prog_tests/str_key.c
 create mode 100644 tools/testing/selftests/bpf/progs/str_htab_bench.c
 create mode 100644 tools/testing/selftests/bpf/progs/str_key.c

Alexei Starovoitov Feb. 17, 2022, 3:50 a.m. UTC | #1

On Mon, Feb 14, 2022 at 07:13:34PM +0800, Hou Tao wrote:
> Hi,
> 
> In order to use string as hash-table key, key_size must be the storage
> size of longest string. If there are large differencies in string
> length, the hash distribution will be sub-optimal due to the unused
> zero bytes in shorter strings and the lookup will be inefficient due to
> unnecessary memcmp().
> 
> Also it is possible the unused part of string key returned from bpf helper
> (e.g. bpf_d_path) is not mem-zeroed and if using it directly as lookup key,
> the lookup will fail with -ENOENT (as reported in [1]).
> 
> The patchset tries to address the inefficiency by adding support for
> string key. There is extensibility problem in v1 because the string key
> and its optimization is only available for string-only key. To make it
> extensible, v2 introduces bpf_str_key_stor and bpf_str_key_desc and enforce
> the layout of hash key struct through BTF as follows:
> 
> 	>the start of hash key
> 	...
> 	[struct bpf_str_key_desc m;]
> 	...
> 	[struct bpf_str_key_desc n;]
> 	...
> 	struct bpf_str_key_stor z;
> 	unsigned char raw[N];
> 	>the end of hash key

Sorry, but this is dead end.
The v2 didn't fundamentally change the design.
The bpf_str_key_desc has an offset field, but it's unused.
The len field is dynamically checked at run-time and all hash maps
users will be paying that penalty though majority will not be
using this feature.
This patch set is incredibly specific solution to one task.
It's far from being generic. We cannot extend bpf this way.
All building blocks must be as generic as possible.
If you want strings as a key then the key must be variable size.
This patch doesn't make them so. It still requires some
predefined fixed size for largest string. This is no go.
Variable sized key means truly variable to megabytes long.
The key would need to contain an object (a pointer wrap) to
a variable sized object. And this object can be arbitrary
blob of bytes. Not just null terminated string.
We've been thinking about "dynamic pointer" concept where
pointer + length will be represented as an object.
The program will be able to allocate it and persist into a map value
and potentially into a map key.
For storing a bunch of strings one can use a strong hash and store
that hash in the key while keeping full string as a variable sized
object inside the value.
Another approach is to introduce a trie to store strings, or dfa,
or aho-corasick, etc. There are plenty of data structures that are
more suitable for storing and searching strings depending on the use case.
Using hash table for strings has its downsides.

Hou Tao Feb. 18, 2022, 1:53 p.m. UTC | #2

Hi,

On 2/17/2022 11:50 AM, Alexei Starovoitov wrote:
> On Mon, Feb 14, 2022 at 07:13:34PM +0800, Hou Tao wrote:
>> Hi,
>>
>> In order to use string as hash-table key, key_size must be the storage
>> size of longest string. If there are large differencies in string
>> length, the hash distribution will be sub-optimal due to the unused
>> zero bytes in shorter strings and the lookup will be inefficient due to
>> unnecessary memcmp().
>>
>> Also it is possible the unused part of string key returned from bpf helper
>> (e.g. bpf_d_path) is not mem-zeroed and if using it directly as lookup key,
>> the lookup will fail with -ENOENT (as reported in [1]).
>>
>> The patchset tries to address the inefficiency by adding support for
>> string key. There is extensibility problem in v1 because the string key
>> and its optimization is only available for string-only key. To make it
>> extensible, v2 introduces bpf_str_key_stor and bpf_str_key_desc and enforce
>> the layout of hash key struct through BTF as follows:
>>
>> 	>the start of hash key
>> 	...
>> 	[struct bpf_str_key_desc m;]
>> 	...
>> 	[struct bpf_str_key_desc n;]
>> 	...
>> 	struct bpf_str_key_stor z;
>> 	unsigned char raw[N];
>> 	>the end of hash key
> Sorry, but this is dead end.
> The v2 didn't fundamentally change the design.
> The bpf_str_key_desc has an offset field, but it's unused.
It is used during key comparison to ensure the offset and length of
sub-strings are the same and it also can be used to locate the sub-string
from the raw array.
> The len field is dynamically checked at run-time and all hash maps
> users will be paying that penalty though majority will not be
> using this feature.
> This patch set is incredibly specific solution to one task.
> It's far from being generic. We cannot extend bpf this way.
> All building blocks must be as generic as possible.
Yes, you are right. I worked in the wrong direction. I tried to keep
the change set being small and good performance, but forget that it is
not generic enough.
> If you want strings as a key then the key must be variable size.
> This patch doesn't make them so. It still requires some
> predefined fixed size for largest string. This is no go.
> Variable sized key means truly variable to megabytes long.
> The key would need to contain an object (a pointer wrap) to
> a variable sized object. And this object can be arbitrary
> blob of bytes. Not just null terminated string.
> We've been thinking about "dynamic pointer" concept where
> pointer + length will be represented as an object.
I have seen the proposal from Joanne Koong on "dynamic pointers".
It can solve the storage problem of string, but the lookup of
string is still a problem. Hash is a option but can we support
two dynamic pointers points to the same internal object and use
the id of the internal object to represent the string ?
> The program will be able to allocate it and persist into a map value
> and potentially into a map key.
> For storing a bunch of strings one can use a strong hash and store
> that hash in the key while keeping full string as a variable sized
> object inside the value.
Will using a strong hash function impact the lookup performance because
each lookup will need to calculate the hash ?

> Another approach is to introduce a trie to store strings, or dfa,
> or aho-corasick, etc. There are plenty of data structures that are
> more suitable for storing and searching strings depending on the use case.
> Using hash table for strings has its downsides.
> .
Before add support for string key in hash table, we had implement tries,
ternary search tree and hash table in user-space for string lookup. hash
table shows better performance and memory usage, so we decide to do string
support in hash table. We will revisit our tests and investigate new string
data structures.

Regards,
Tao

Alexei Starovoitov Feb. 19, 2022, 6:44 p.m. UTC | #3

On Fri, Feb 18, 2022 at 5:54 AM Hou Tao <houtao1@huawei.com> wrote:
>
> > We've been thinking about "dynamic pointer" concept where
> > pointer + length will be represented as an object.
> I have seen the proposal from Joanne Koong on "dynamic pointers".
> It can solve the storage problem of string, but the lookup of
> string is still a problem. Hash is a option but can we support
> two dynamic pointers points to the same internal object and use
> the id of the internal object to represent the string ?

Let's have a discussion about dynptr in that thread.

> > The program will be able to allocate it and persist into a map value
> > and potentially into a map key.
> > For storing a bunch of strings one can use a strong hash and store
> > that hash in the key while keeping full string as a variable sized
> > object inside the value.
> Will using a strong hash function impact the lookup performance because
> each lookup will need to calculate the hash ?
>
> > Another approach is to introduce a trie to store strings, or dfa,
> > or aho-corasick, etc. There are plenty of data structures that are
> > more suitable for storing and searching strings depending on the use case.
> > Using hash table for strings has its downsides.
> > .
> Before add support for string key in hash table, we had implement tries,
> ternary search tree and hash table in user-space for string lookup. hash
> table shows better performance and memory usage, so we decide to do string
> support in hash table. We will revisit our tests and investigate new string
> data structures.

What performance characteristics did you consider?
Just the speed of the lookup ?
How many elements are there in the map ?
Is it 80% or 10% full ?
Did you compare memory usage ?
Did you consider the speed of update/delete ?
preallocated or dynamic alloc ?

With dynamic pointers and further verifier improvements bpf programs
will be able to implement a hash table on their own.

Alexei Starovoitov Feb. 27, 2022, 3:08 a.m. UTC | #4

On Sat, Feb 26, 2022 at 4:16 AM Hou Tao <houtao1@huawei.com> wrote:
>
> For now, our case is a write-once case, so only lookup is considered.
> When data set is bigger than 128KB, hash table has better lookup performance as
> show below:
>
> | lookup all elem (ms) | 4K  | 16K  | 64K  | 128K  | 256K  | 512K  | 1M     | 2M     | 4M      | 7M      |
> | -------------------- | --- | ---- | ---- | ----- | ----- | ----- | ------ | ------ | ------- | ------- |
> | hash                 | 3.3 | 12.7 | 47   | 90.6  | 185.9 | 382.3 | 788.5  | 1622.4 | 3296    | 6248.7  |
> | tries                | 2   | 10   | 45.9 | 111.6 | 274.6 | 688.9 | 1747.2 | 4394.5 | 11229.8 | 27148.8 |
> | tst                  | 3.8 | 16.4 | 61.3 | 139.1 | 313.9 | 707.3 | 1641.3 | 3856.1 | 9002.3  | 19793.8 |

Yeah. It's hard to beat hash lookup when it's hitting a good case of O(1),
but what are the chances that it stays this way?
Are you saying you can size up the table and populate to good % just once?

If so it's probably better to replace all strings with something
like a good hash.
7M elements is not a lot. A hash producing 8 or 16 bytes will have close
to zero false positives.
And in case of "populate the table once" the hash seed can be
precomputed and adjusted, so you can guarantee zero collisions
for 7M strings. While lookup part can still have 0.001% chance
of a false positive there could be a way to deal with it after lookup.

> Ternary search tree always has better memory usage:
>
> | memory usage (MB) | 4K  | 16K | 64K  | 128K | 256K | 512K | 1M   | 2M    | 4M    | 7M     |
> | ----------------- | --- | --- | ---- | ---- | ---- | ---- | ---- | ----- | ----- | ------ |
> | hash              | 2.2 | 8.9 | 35.5 | 71   | 142  | 284  | 568  | 1136  | 2272  | 4302.5 |
> | tries             | 2.1 | 8.5 | 34   | 68   | 136  | 272  | 544  | 1088  | 2176  | 4106.9 |
> | tst               | 0.5 | 1.6 | 5.6  | 10.6 | 20.3 | 38.6 | 73.1 | 138.6 | 264.6 | 479.5  |
>

Ternary search tree looks amazing.
Since you have a prototype can you wrap it into a new type of bpf map
and post the patches?
I wonder what data structures look like to achieve such memory efficiency.

Hou Tao March 9, 2022, 11:47 a.m. UTC | #5

Hi,

On 2/27/2022 11:08 AM, Alexei Starovoitov wrote:
> On Sat, Feb 26, 2022 at 4:16 AM Hou Tao <houtao1@huawei.com> wrote:
>>
>> For now, our case is a write-once case, so only lookup is considered.
>> When data set is bigger than 128KB, hash table has better lookup performance as
>> show below:
>>
>> | lookup all elem (ms) | 4K  | 16K  | 64K  | 128K  | 256K  | 512K  | 1M     | 2M     | 4M      | 7M      |
>> | -------------------- | --- | ---- | ---- | ----- | ----- | ----- | ------ | ------ | ------- | ------- |
>> | hash                 | 3.3 | 12.7 | 47   | 90.6  | 185.9 | 382.3 | 788.5  | 1622.4 | 3296    | 6248.7  |
>> | tries                | 2   | 10   | 45.9 | 111.6 | 274.6 | 688.9 | 1747.2 | 4394.5 | 11229.8 | 27148.8 |
>> | tst                  | 3.8 | 16.4 | 61.3 | 139.1 | 313.9 | 707.3 | 1641.3 | 3856.1 | 9002.3  | 19793.8 |
> 
> Yeah. It's hard to beat hash lookup when it's hitting a good case of O(1),
> but what are the chances that it stays this way?
> Are you saying you can size up the table and populate to good % just once?
>
Yes. for our case the hash table is populated only once. During these test the
hash table is populated firstly by inserting all strings into the table and then
do the lookup. The strings for all tests come from the same string set.

> If so it's probably better to replace all strings with something
> like a good hash
A strong one like sha1sum and using the string as hash-table value just
as proposed in previous email ?

> 7M elements is not a lot. A hash producing 8 or 16 bytes will have close
> to zero false positives.
> And in case of "populate the table once" the hash seed can be
> precomputed and adjusted, so you can guarantee zero collisions
> for 7M strings. While lookup part can still have 0.001% chance
> of a false positive there could be a way to deal with it after lookup.
>
I can try the above method. But the lookup procedure will be slowed done by
calculating a good hash and the memory usage will not reduced.

>> Ternary search tree always has better memory usage:
>>
>> | memory usage (MB) | 4K  | 16K | 64K  | 128K | 256K | 512K | 1M   | 2M    | 4M    | 7M     |
>> | ----------------- | --- | --- | ---- | ---- | ---- | ---- | ---- | ----- | ----- | ------ |
>> | hash              | 2.2 | 8.9 | 35.5 | 71   | 142  | 284  | 568  | 1136  | 2272  | 4302.5 |
>> | tries             | 2.1 | 8.5 | 34   | 68   | 136  | 272  | 544  | 1088  | 2176  | 4106.9 |
>> | tst               | 0.5 | 1.6 | 5.6  | 10.6 | 20.3 | 38.6 | 73.1 | 138.6 | 264.6 | 479.5  |
>>
> 
> Ternary search tree looks amazing.
> Since you have a prototype can you wrap it into a new type of bpf map
> and post the patches?
Will do.

> I wonder what data structures look like to achieve such memory efficiency.
The lower memory usage partially is due to the string set for test is full file
paths and these paths share the same prefix. And ternary search tree reduces the
memory usage by sharing the common prefix.
> .
>

[RFC,bpf-next,v2,0/3] bpf: support string key in htab

Message

Comments