diff mbox series

[2/4] net/colo: Fix a "double free" crash to clear the conn_list

Message ID 20220309083858.58117-3-chen.zhang@intel.com (mailing list archive)
State New, archived
Headers show
Series COLO net and runstate bugfix/optimization | expand

Commit Message

Zhang Chen March 9, 2022, 8:38 a.m. UTC
We notice the QEMU may crash when the guest has too many
incoming network connections with the following log:

15197@1593578622.668573:colo_proxy_main : colo proxy connection hashtable full, clear it
free(): invalid pointer
[1]    15195 abort (core dumped)  qemu-system-x86_64 ....

This is because we create the s->connection_track_table with
g_hash_table_new_full() which is defined as:

GHashTable * g_hash_table_new_full (GHashFunc hash_func,
                       GEqualFunc key_equal_func,
                       GDestroyNotify key_destroy_func,
                       GDestroyNotify value_destroy_func);

The fourth parameter connection_destroy() will be called to free the
memory allocated for all 'Connection' values in the hashtable when
we call g_hash_table_remove_all() in the connection_hashtable_reset().

It's unnecessary because we clear the conn_list explicitly later,
and it's buggy when other agents try to call connection_get()
with the same connection_track_table.

Signed-off-by: Like Xu <like.xu@linux.intel.com>
Signed-off-by: Zhang Chen <chen.zhang@intel.com>
---
 net/colo-compare.c    | 2 +-
 net/filter-rewriter.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

Comments

Zhijian Li (Fujitsu) March 21, 2022, 3:05 a.m. UTC | #1
On 09/03/2022 16:38, Zhang Chen wrote:
> We notice the QEMU may crash when the guest has too many
> incoming network connections with the following log:
>
> 15197@1593578622.668573:colo_proxy_main : colo proxy connection hashtable full, clear it
> free(): invalid pointer
> [1]    15195 abort (core dumped)  qemu-system-x86_64 ....
>
> This is because we create the s->connection_track_table with
> g_hash_table_new_full() which is defined as:
>
> GHashTable * g_hash_table_new_full (GHashFunc hash_func,
>                         GEqualFunc key_equal_func,
>                         GDestroyNotify key_destroy_func,
>                         GDestroyNotify value_destroy_func);
>
> The fourth parameter connection_destroy() will be called to free the
> memory allocated for all 'Connection' values in the hashtable when
> we call g_hash_table_remove_all() in the connection_hashtable_reset().
>
> It's unnecessary because we clear the conn_list explicitly later,
> and it's buggy when other agents try to call connection_get()
> with the same connection_track_table.
>
> Signed-off-by: Like Xu <like.xu@linux.intel.com>
> Signed-off-by: Zhang Chen <chen.zhang@intel.com>
> ---
>   net/colo-compare.c    | 2 +-
>   net/filter-rewriter.c | 2 +-
>   2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/net/colo-compare.c b/net/colo-compare.c
> index 62554b5b3c..ab054cfd21 100644
> --- a/net/colo-compare.c
> +++ b/net/colo-compare.c
> @@ -1324,7 +1324,7 @@ static void colo_compare_complete(UserCreatable *uc, Error **errp)
>       s->connection_track_table = g_hash_table_new_full(connection_key_hash,
>                                                         connection_key_equal,
>                                                         g_free,
> -                                                      connection_destroy);
> +                                                      NULL);


202 /* if not found, create a new connection and add to hash table */
203 Connection *connection_get(GHashTable *connection_track_table,
204                            ConnectionKey *key,
205                            GQueue *conn_list)
206 {
207     Connection *conn = g_hash_table_lookup(connection_track_table, key);
208
209     if (conn == NULL) {
210         ConnectionKey *new_key = g_memdup(key, sizeof(*key));
211
212         conn = connection_new(key);
213
214         if (g_hash_table_size(connection_track_table) > HASHTABLE_MAX_SIZE) {
215             trace_colo_proxy_main("colo proxy connection hashtable full,"
216                                   " clear it");
217 connection_hashtable_reset(connection_track_table);

197 void connection_hashtable_reset(GHashTable *connection_track_table)
198 {
199 g_hash_table_remove_all(connection_track_table);
200 }

IIUC,  above subroutine will do some cleanup explicitly. And before your patch, connection_hashtable_reset()
will release all keys and their values in this hashtable. But now, you remove all keys and just
one value(conn_list) instead. Does it means other values will be leaked ?


218 /*
219              * clear the conn_list
220 */
221             while (!g_queue_is_empty(conn_list)) {
222 connection_destroy(g_queue_pop_head(conn_list));
223 }
224 }
225
226         g_hash_table_insert(connection_track_table, new_key, conn);
227 }
228
229     return conn;
230 }


Thanks
Zhijian

>   
>       colo_compare_iothread(s);
>   
> diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c
> index bf05023dc3..c18c4c2019 100644
> --- a/net/filter-rewriter.c
> +++ b/net/filter-rewriter.c
> @@ -383,7 +383,7 @@ static void colo_rewriter_setup(NetFilterState *nf, Error **errp)
>       s->connection_track_table = g_hash_table_new_full(connection_key_hash,
>                                                         connection_key_equal,
>                                                         g_free,
> -                                                      connection_destroy);
> +                                                      NULL);
>       s->incoming_queue = qemu_new_net_queue(qemu_netfilter_pass_to_next, nf);
>   }
>
Zhang Chen March 28, 2022, 9:13 a.m. UTC | #2
> -----Original Message-----
> From: lizhijian@fujitsu.com <lizhijian@fujitsu.com>
> Sent: Monday, March 21, 2022 11:06 AM
> To: Zhang, Chen <chen.zhang@intel.com>; Jason Wang
> <jasowang@redhat.com>; lizhijian@fujitsu.com
> Cc: qemu-dev <qemu-devel@nongnu.org>; Like Xu <like.xu@linux.intel.com>
> Subject: Re: [PATCH 2/4] net/colo: Fix a "double free" crash to clear the
> conn_list
> 
> 
> 
> On 09/03/2022 16:38, Zhang Chen wrote:
> > We notice the QEMU may crash when the guest has too many incoming
> > network connections with the following log:
> >
> > 15197@1593578622.668573:colo_proxy_main : colo proxy connection
> > hashtable full, clear it
> > free(): invalid pointer
> > [1]    15195 abort (core dumped)  qemu-system-x86_64 ....
> >
> > This is because we create the s->connection_track_table with
> > g_hash_table_new_full() which is defined as:
> >
> > GHashTable * g_hash_table_new_full (GHashFunc hash_func,
> >                         GEqualFunc key_equal_func,
> >                         GDestroyNotify key_destroy_func,
> >                         GDestroyNotify value_destroy_func);
> >
> > The fourth parameter connection_destroy() will be called to free the
> > memory allocated for all 'Connection' values in the hashtable when we
> > call g_hash_table_remove_all() in the connection_hashtable_reset().
> >
> > It's unnecessary because we clear the conn_list explicitly later, and
> > it's buggy when other agents try to call connection_get() with the
> > same connection_track_table.
> >
> > Signed-off-by: Like Xu <like.xu@linux.intel.com>
> > Signed-off-by: Zhang Chen <chen.zhang@intel.com>
> > ---
> >   net/colo-compare.c    | 2 +-
> >   net/filter-rewriter.c | 2 +-
> >   2 files changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/net/colo-compare.c b/net/colo-compare.c index
> > 62554b5b3c..ab054cfd21 100644
> > --- a/net/colo-compare.c
> > +++ b/net/colo-compare.c
> > @@ -1324,7 +1324,7 @@ static void
> colo_compare_complete(UserCreatable *uc, Error **errp)
> >       s->connection_track_table =
> g_hash_table_new_full(connection_key_hash,
> >                                                         connection_key_equal,
> >                                                         g_free,
> > -                                                      connection_destroy);
> > +                                                      NULL);
> 
> 
> 202 /* if not found, create a new connection and add to hash table */
> 203 Connection *connection_get(GHashTable *connection_track_table,
> 204                            ConnectionKey *key,
> 205                            GQueue *conn_list)
> 206 {
> 207     Connection *conn = g_hash_table_lookup(connection_track_table,
> key);
> 208
> 209     if (conn == NULL) {
> 210         ConnectionKey *new_key = g_memdup(key, sizeof(*key));
> 211
> 212         conn = connection_new(key);
> 213
> 214         if (g_hash_table_size(connection_track_table) >
> HASHTABLE_MAX_SIZE) {
> 215             trace_colo_proxy_main("colo proxy connection hashtable full,"
> 216                                   " clear it");
> 217 connection_hashtable_reset(connection_track_table);
> 
> 197 void connection_hashtable_reset(GHashTable *connection_track_table)
> 198 {
> 199 g_hash_table_remove_all(connection_track_table);
> 200 }
> 
> IIUC,  above subroutine will do some cleanup explicitly. And before your
> patch, connection_hashtable_reset() will release all keys and their values in
> this hashtable. But now, you remove all keys and just one value(conn_list)
> instead. Does it means other values will be leaked ?

Thanks Zhijian. Re-think about it. Yes, I think you are right.
It looks need a lock to avoid into connection_get() when triggered the clear hashtable operation.
What do you think?

Thanks
Chen


> 
> 
> 218 /*
> 219              * clear the conn_list
> 220 */
> 221             while (!g_queue_is_empty(conn_list)) {
> 222 connection_destroy(g_queue_pop_head(conn_list));
> 223 }
> 224 }
> 225
> 226         g_hash_table_insert(connection_track_table, new_key, conn);
> 227 }
> 228
> 229     return conn;
> 230 }
> 
> 
> Thanks
> Zhijian
> 
> >
> >       colo_compare_iothread(s);
> >
> > diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c index
> > bf05023dc3..c18c4c2019 100644
> > --- a/net/filter-rewriter.c
> > +++ b/net/filter-rewriter.c
> > @@ -383,7 +383,7 @@ static void colo_rewriter_setup(NetFilterState *nf,
> Error **errp)
> >       s->connection_track_table =
> g_hash_table_new_full(connection_key_hash,
> >                                                         connection_key_equal,
> >                                                         g_free,
> > -                                                      connection_destroy);
> > +                                                      NULL);
> >       s->incoming_queue =
> qemu_new_net_queue(qemu_netfilter_pass_to_next, nf);
> >   }
> >
Zhijian Li (Fujitsu) March 31, 2022, 1:14 a.m. UTC | #3
connection_track_table
-----+----------
key1 | conn    |-----------------------------------------------------------+
-----+----------                                                           |
key2 | conn    |----------------------------------+                        |
-----+----------                                  |                        |
key3 | conn    |---------+                        |                        |
-----+----------         |                        |                        |
                          |                        |                        |
                          |                        |                        |
     + CompareState ++    |                        |                        |
     |               |    V                        V                        V
     +---------------+   +---------------+         +---------------+
     |conn_list      +--->conn           +--------->conn           |     connx
     +---------------+   +---------------+         +---------------+
     |               |     |           |             |          |
     +---------------+ +---v----+  +---v----+    +---v----+ +---v----+
                       |primary |  |secondary    |primary | |secondary
                       |packet  |  |packet  +    |packet  | |packet  +
                       +--------+  +--------+    +--------+ +--------+
                           |           |             |          |
                       +---v----+  +---v----+    +---v----+ +---v----+
                       |primary |  |secondary    |primary | |secondary
                       |packet  |  |packet  +    |packet  | |packet  +
                       +--------+  +--------+    +--------+ +--------+
                           |           |             |          |
                       +---v----+  +---v----+    +---v----+ +---v----+
                       |primary |  |secondary    |primary | |secondary
                       |packet  |  |packet  +    |packet  | |packet  +
                       +--------+  +--------+    +--------+ +--------+
  
I recalled that we should above relationships between connection_track_table conn_list and conn.
That means both connection_track_table and conn_list reference to the same conn instance.

So before this patch, connection_get() is possible to use-after-free/double free conn. where 1st was in
connection_hashtable_reset() and 2nd was
221             while (!g_queue_is_empty(conn_list)) {
222                 connection_destroy(g_queue_pop_head(conn_list));
223             }

I also doubt that your current abort was just due to above use-after-free/double free.
If so, looks it's enough we just update to g_queue_clear(conn_list) in the 2nd place.

Thanks
Zhijian


On 28/03/2022 17:13, Zhang, Chen wrote:
>
>> -----Original Message-----
>> From: lizhijian@fujitsu.com <lizhijian@fujitsu.com>
>> Sent: Monday, March 21, 2022 11:06 AM
>> To: Zhang, Chen <chen.zhang@intel.com>; Jason Wang
>> <jasowang@redhat.com>; lizhijian@fujitsu.com
>> Cc: qemu-dev <qemu-devel@nongnu.org>; Like Xu <like.xu@linux.intel.com>
>> Subject: Re: [PATCH 2/4] net/colo: Fix a "double free" crash to clear the
>> conn_list
>>
>>
>>
>> On 09/03/2022 16:38, Zhang Chen wrote:
>>> We notice the QEMU may crash when the guest has too many incoming
>>> network connections with the following log:
>>>
>>> 15197@1593578622.668573:colo_proxy_main : colo proxy connection
>>> hashtable full, clear it
>>> free(): invalid pointer
>>> [1]    15195 abort (core dumped)  qemu-system-x86_64 ....
>>>
>>> This is because we create the s->connection_track_table with
>>> g_hash_table_new_full() which is defined as:
>>>
>>> GHashTable * g_hash_table_new_full (GHashFunc hash_func,
>>>                          GEqualFunc key_equal_func,
>>>                          GDestroyNotify key_destroy_func,
>>>                          GDestroyNotify value_destroy_func);
>>>
>>> The fourth parameter connection_destroy() will be called to free the
>>> memory allocated for all 'Connection' values in the hashtable when we
>>> call g_hash_table_remove_all() in the connection_hashtable_reset().
>>>
>>> It's unnecessary because we clear the conn_list explicitly later, and
>>> it's buggy when other agents try to call connection_get() with the
>>> same connection_track_table.
>>>
>>> Signed-off-by: Like Xu <like.xu@linux.intel.com>
>>> Signed-off-by: Zhang Chen <chen.zhang@intel.com>
>>> ---
>>>    net/colo-compare.c    | 2 +-
>>>    net/filter-rewriter.c | 2 +-
>>>    2 files changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/net/colo-compare.c b/net/colo-compare.c index
>>> 62554b5b3c..ab054cfd21 100644
>>> --- a/net/colo-compare.c
>>> +++ b/net/colo-compare.c
>>> @@ -1324,7 +1324,7 @@ static void
>> colo_compare_complete(UserCreatable *uc, Error **errp)
>>>        s->connection_track_table =
>> g_hash_table_new_full(connection_key_hash,
>>>                                                          connection_key_equal,
>>>                                                          g_free,
>>> -                                                      connection_destroy);
>>> +                                                      NULL);
>>
>> 202 /* if not found, create a new connection and add to hash table */
>> 203 Connection *connection_get(GHashTable *connection_track_table,
>> 204                            ConnectionKey *key,
>> 205                            GQueue *conn_list)
>> 206 {
>> 207     Connection *conn = g_hash_table_lookup(connection_track_table,
>> key);
>> 208
>> 209     if (conn == NULL) {
>> 210         ConnectionKey *new_key = g_memdup(key, sizeof(*key));
>> 211
>> 212         conn = connection_new(key);
>> 213
>> 214         if (g_hash_table_size(connection_track_table) >
>> HASHTABLE_MAX_SIZE) {
>> 215             trace_colo_proxy_main("colo proxy connection hashtable full,"
>> 216                                   " clear it");
>> 217 connection_hashtable_reset(connection_track_table);
>>
>> 197 void connection_hashtable_reset(GHashTable *connection_track_table)
>> 198 {
>> 199 g_hash_table_remove_all(connection_track_table);
>> 200 }
>>
>> IIUC,  above subroutine will do some cleanup explicitly. And before your
>> patch, connection_hashtable_reset() will release all keys and their values in
>> this hashtable. But now, you remove all keys and just one value(conn_list)
>> instead. Does it means other values will be leaked ?
> Thanks Zhijian. Re-think about it. Yes, I think you are right.
> It looks need a lock to avoid into connection_get() when triggered the clear hashtable operation.
> What do you think?
>
> Thanks
> Chen
>
>
>>
>> 218 /*
>> 219              * clear the conn_list
>> 220 */
>> 221             while (!g_queue_is_empty(conn_list)) {
>> 222 connection_destroy(g_queue_pop_head(conn_list));
>> 223 }
>> 224 }
>> 225
>> 226         g_hash_table_insert(connection_track_table, new_key, conn);
>> 227 }
>> 228
>> 229     return conn;
>> 230 }
>>
>>
>> Thanks
>> Zhijian
>>
>>>        colo_compare_iothread(s);
>>>
>>> diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c index
>>> bf05023dc3..c18c4c2019 100644
>>> --- a/net/filter-rewriter.c
>>> +++ b/net/filter-rewriter.c
>>> @@ -383,7 +383,7 @@ static void colo_rewriter_setup(NetFilterState *nf,
>> Error **errp)
>>>        s->connection_track_table =
>> g_hash_table_new_full(connection_key_hash,
>>>                                                          connection_key_equal,
>>>                                                          g_free,
>>> -                                                      connection_destroy);
>>> +                                                      NULL);
>>>        s->incoming_queue =
>> qemu_new_net_queue(qemu_netfilter_pass_to_next, nf);
>>>    }
>>>
Zhang Chen March 31, 2022, 2:25 a.m. UTC | #4
> -----Original Message-----
> From: lizhijian@fujitsu.com <lizhijian@fujitsu.com>
> Sent: Thursday, March 31, 2022 9:15 AM
> To: Zhang, Chen <chen.zhang@intel.com>; Jason Wang
> <jasowang@redhat.com>
> Cc: qemu-dev <qemu-devel@nongnu.org>; Like Xu <like.xu@linux.intel.com>
> Subject: Re: [PATCH 2/4] net/colo: Fix a "double free" crash to clear the
> conn_list
> 
> 
> connection_track_table
> -----+----------
> key1 | conn    |-----------------------------------------------------------+
> -----+----------                                                           |
> key2 | conn    |----------------------------------+                        |
> -----+----------                                  |                        |
> key3 | conn    |---------+                        |                        |
> -----+----------         |                        |                        |
>                           |                        |                        |
>                           |                        |                        |
>      + CompareState ++    |                        |                        |
>      |               |    V                        V                        V
>      +---------------+   +---------------+         +---------------+
>      |conn_list      +--->conn           +--------->conn           |     connx
>      +---------------+   +---------------+         +---------------+
>      |               |     |           |             |          |
>      +---------------+ +---v----+  +---v----+    +---v----+ +---v----+
>                        |primary |  |secondary    |primary | |secondary
>                        |packet  |  |packet  +    |packet  | |packet  +
>                        +--------+  +--------+    +--------+ +--------+
>                            |           |             |          |
>                        +---v----+  +---v----+    +---v----+ +---v----+
>                        |primary |  |secondary    |primary | |secondary
>                        |packet  |  |packet  +    |packet  | |packet  +
>                        +--------+  +--------+    +--------+ +--------+
>                            |           |             |          |
>                        +---v----+  +---v----+    +---v----+ +---v----+
>                        |primary |  |secondary    |primary | |secondary
>                        |packet  |  |packet  +    |packet  | |packet  +
>                        +--------+  +--------+    +--------+ +--------+
> 
> I recalled that we should above relationships between
> connection_track_table conn_list and conn.
> That means both connection_track_table and conn_list reference to the
> same conn instance.
> 
> So before this patch, connection_get() is possible to use-after-free/double
> free conn. where 1st was in
> connection_hashtable_reset() and 2nd was
> 221             while (!g_queue_is_empty(conn_list)) {
> 222                 connection_destroy(g_queue_pop_head(conn_list));
> 223             }
> 
> I also doubt that your current abort was just due to above use-after-
> free/double free.
> If so, looks it's enough we just update to g_queue_clear(conn_list) in the 2nd
> place.

Make sense, but It also means the original patch works here, skip free conn in connection_hashtable_reset() and do it in:
221             while (!g_queue_is_empty(conn_list)) {
 222                 connection_destroy(g_queue_pop_head(conn_list));
 223             }.
It also avoid use-after-free/double free conn.
Maybe we can keep the original version to fix it?

Thanks
Chen

> 
> Thanks
> Zhijian
> 
> 
> On 28/03/2022 17:13, Zhang, Chen wrote:
> >
> >> -----Original Message-----
> >> From: lizhijian@fujitsu.com <lizhijian@fujitsu.com>
> >> Sent: Monday, March 21, 2022 11:06 AM
> >> To: Zhang, Chen <chen.zhang@intel.com>; Jason Wang
> >> <jasowang@redhat.com>; lizhijian@fujitsu.com
> >> Cc: qemu-dev <qemu-devel@nongnu.org>; Like Xu
> >> <like.xu@linux.intel.com>
> >> Subject: Re: [PATCH 2/4] net/colo: Fix a "double free" crash to clear
> >> the conn_list
> >>
> >>
> >>
> >> On 09/03/2022 16:38, Zhang Chen wrote:
> >>> We notice the QEMU may crash when the guest has too many incoming
> >>> network connections with the following log:
> >>>
> >>> 15197@1593578622.668573:colo_proxy_main : colo proxy connection
> >>> hashtable full, clear it
> >>> free(): invalid pointer
> >>> [1]    15195 abort (core dumped)  qemu-system-x86_64 ....
> >>>
> >>> This is because we create the s->connection_track_table with
> >>> g_hash_table_new_full() which is defined as:
> >>>
> >>> GHashTable * g_hash_table_new_full (GHashFunc hash_func,
> >>>                          GEqualFunc key_equal_func,
> >>>                          GDestroyNotify key_destroy_func,
> >>>                          GDestroyNotify value_destroy_func);
> >>>
> >>> The fourth parameter connection_destroy() will be called to free the
> >>> memory allocated for all 'Connection' values in the hashtable when
> >>> we call g_hash_table_remove_all() in the connection_hashtable_reset().
> >>>
> >>> It's unnecessary because we clear the conn_list explicitly later,
> >>> and it's buggy when other agents try to call connection_get() with
> >>> the same connection_track_table.
> >>>
> >>> Signed-off-by: Like Xu <like.xu@linux.intel.com>
> >>> Signed-off-by: Zhang Chen <chen.zhang@intel.com>
> >>> ---
> >>>    net/colo-compare.c    | 2 +-
> >>>    net/filter-rewriter.c | 2 +-
> >>>    2 files changed, 2 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/net/colo-compare.c b/net/colo-compare.c index
> >>> 62554b5b3c..ab054cfd21 100644
> >>> --- a/net/colo-compare.c
> >>> +++ b/net/colo-compare.c
> >>> @@ -1324,7 +1324,7 @@ static void
> >> colo_compare_complete(UserCreatable *uc, Error **errp)
> >>>        s->connection_track_table =
> >> g_hash_table_new_full(connection_key_hash,
> >>>                                                          connection_key_equal,
> >>>                                                          g_free,
> >>> -                                                      connection_destroy);
> >>> +                                                      NULL);
> >>
> >> 202 /* if not found, create a new connection and add to hash table */
> >> 203 Connection *connection_get(GHashTable *connection_track_table,
> >> 204                            ConnectionKey *key,
> >> 205                            GQueue *conn_list)
> >> 206 {
> >> 207     Connection *conn =
> >> g_hash_table_lookup(connection_track_table,
> >> key);
> >> 208
> >> 209     if (conn == NULL) {
> >> 210         ConnectionKey *new_key = g_memdup(key, sizeof(*key));
> >> 211
> >> 212         conn = connection_new(key);
> >> 213
> >> 214         if (g_hash_table_size(connection_track_table) >
> >> HASHTABLE_MAX_SIZE) {
> >> 215             trace_colo_proxy_main("colo proxy connection hashtable full,"
> >> 216                                   " clear it");
> >> 217 connection_hashtable_reset(connection_track_table);
> >>
> >> 197 void connection_hashtable_reset(GHashTable
> >> *connection_track_table)
> >> 198 {
> >> 199 g_hash_table_remove_all(connection_track_table);
> >> 200 }
> >>
> >> IIUC,  above subroutine will do some cleanup explicitly. And before
> >> your patch, connection_hashtable_reset() will release all keys and
> >> their values in this hashtable. But now, you remove all keys and just
> >> one value(conn_list) instead. Does it means other values will be leaked ?
> > Thanks Zhijian. Re-think about it. Yes, I think you are right.
> > It looks need a lock to avoid into connection_get() when triggered the clear
> hashtable operation.
> > What do you think?
> >
> > Thanks
> > Chen
> >
> >
> >>
> >> 218 /*
> >> 219              * clear the conn_list
> >> 220 */
> >> 221             while (!g_queue_is_empty(conn_list)) {
> >> 222 connection_destroy(g_queue_pop_head(conn_list));
> >> 223 }
> >> 224 }
> >> 225
> >> 226         g_hash_table_insert(connection_track_table, new_key,
> >> conn);
> >> 227 }
> >> 228
> >> 229     return conn;
> >> 230 }
> >>
> >>
> >> Thanks
> >> Zhijian
> >>
> >>>        colo_compare_iothread(s);
> >>>
> >>> diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c index
> >>> bf05023dc3..c18c4c2019 100644
> >>> --- a/net/filter-rewriter.c
> >>> +++ b/net/filter-rewriter.c
> >>> @@ -383,7 +383,7 @@ static void colo_rewriter_setup(NetFilterState
> >>> *nf,
> >> Error **errp)
> >>>        s->connection_track_table =
> >> g_hash_table_new_full(connection_key_hash,
> >>>                                                          connection_key_equal,
> >>>                                                          g_free,
> >>> -                                                      connection_destroy);
> >>> +                                                      NULL);
> >>>        s->incoming_queue =
> >> qemu_new_net_queue(qemu_netfilter_pass_to_next, nf);
> >>>    }
> >>>
Zhijian Li (Fujitsu) April 1, 2022, 1:46 a.m. UTC | #5
On 31/03/2022 10:25, Zhang, Chen wrote:
>
>> -----Original Message-----
>> From: lizhijian@fujitsu.com <lizhijian@fujitsu.com>
>> Sent: Thursday, March 31, 2022 9:15 AM
>> To: Zhang, Chen <chen.zhang@intel.com>; Jason Wang
>> <jasowang@redhat.com>
>> Cc: qemu-dev <qemu-devel@nongnu.org>; Like Xu <like.xu@linux.intel.com>
>> Subject: Re: [PATCH 2/4] net/colo: Fix a "double free" crash to clear the
>> conn_list
>>
>>
>> connection_track_table
>> -----+----------
>> key1 | conn    |-----------------------------------------------------------+
>> -----+----------                                                           |
>> key2 | conn    |----------------------------------+                        |
>> -----+----------                                  |                        |
>> key3 | conn    |---------+                        |                        |
>> -----+----------         |                        |                        |
>>                            |                        |                        |
>>                            |                        |                        |
>>       + CompareState ++    |                        |                        |
>>       |               |    V                        V                        V
>>       +---------------+   +---------------+         +---------------+
>>       |conn_list      +--->conn           +--------->conn           |     connx
>>       +---------------+   +---------------+         +---------------+
>>       |               |     |           |             |          |
>>       +---------------+ +---v----+  +---v----+    +---v----+ +---v----+
>>                         |primary |  |secondary    |primary | |secondary
>>                         |packet  |  |packet  +    |packet  | |packet  +
>>                         +--------+  +--------+    +--------+ +--------+
>>                             |           |             |          |
>>                         +---v----+  +---v----+    +---v----+ +---v----+
>>                         |primary |  |secondary    |primary | |secondary
>>                         |packet  |  |packet  +    |packet  | |packet  +
>>                         +--------+  +--------+    +--------+ +--------+
>>                             |           |             |          |
>>                         +---v----+  +---v----+    +---v----+ +---v----+
>>                         |primary |  |secondary    |primary | |secondary
>>                         |packet  |  |packet  +    |packet  | |packet  +
>>                         +--------+  +--------+    +--------+ +--------+
>>
>> I recalled that we should above relationships between
>> connection_track_table conn_list and conn.
>> That means both connection_track_table and conn_list reference to the
>> same conn instance.
>>
>> So before this patch, connection_get() is possible to use-after-free/double
>> free conn. where 1st was in
>> connection_hashtable_reset() and 2nd was
>> 221             while (!g_queue_is_empty(conn_list)) {
>> 222                 connection_destroy(g_queue_pop_head(conn_list));
>> 223             }
>>
>> I also doubt that your current abort was just due to above use-after-
>> free/double free.
>> If so, looks it's enough we just update to g_queue_clear(conn_list) in the 2nd
>> place.
> Make sense, but It also means the original patch works here, skip free conn in connection_hashtable_reset() and do it in:
> 221             while (!g_queue_is_empty(conn_list)) {
>   222                 connection_destroy(g_queue_pop_head(conn_list));
>   223             }.
> It also avoid use-after-free/double free conn.
Although you will not use-after-free here, you have to consider other situations carefully that
g_hash_table_remove_all() g_hash_table_destroy() were called where the conn_list should also be freed
with you approach.




> Maybe we can keep the original version to fix it?
And your commit log should be more clear.

Thanks
Zhijian

>
> Thanks
> Chen
>
>> Thanks
>> Zhijian
>>
>>
>> On 28/03/2022 17:13, Zhang, Chen wrote:
>>>> -----Original Message-----
>>>> From: lizhijian@fujitsu.com <lizhijian@fujitsu.com>
>>>> Sent: Monday, March 21, 2022 11:06 AM
>>>> To: Zhang, Chen <chen.zhang@intel.com>; Jason Wang
>>>> <jasowang@redhat.com>; lizhijian@fujitsu.com
>>>> Cc: qemu-dev <qemu-devel@nongnu.org>; Like Xu
>>>> <like.xu@linux.intel.com>
>>>> Subject: Re: [PATCH 2/4] net/colo: Fix a "double free" crash to clear
>>>> the conn_list
>>>>
>>>>
>>>>
>>>> On 09/03/2022 16:38, Zhang Chen wrote:
>>>>> We notice the QEMU may crash when the guest has too many incoming
>>>>> network connections with the following log:
>>>>>
>>>>> 15197@1593578622.668573:colo_proxy_main : colo proxy connection
>>>>> hashtable full, clear it
>>>>> free(): invalid pointer
>>>>> [1]    15195 abort (core dumped)  qemu-system-x86_64 ....
>>>>>
>>>>> This is because we create the s->connection_track_table with
>>>>> g_hash_table_new_full() which is defined as:
>>>>>
>>>>> GHashTable * g_hash_table_new_full (GHashFunc hash_func,
>>>>>                           GEqualFunc key_equal_func,
>>>>>                           GDestroyNotify key_destroy_func,
>>>>>                           GDestroyNotify value_destroy_func);
>>>>>
>>>>> The fourth parameter connection_destroy() will be called to free the
>>>>> memory allocated for all 'Connection' values in the hashtable when
>>>>> we call g_hash_table_remove_all() in the connection_hashtable_reset().
>>>>>
>>>>> It's unnecessary because we clear the conn_list explicitly later,
>>>>> and it's buggy when other agents try to call connection_get() with
>>>>> the same connection_track_table.
>>>>>
>>>>> Signed-off-by: Like Xu <like.xu@linux.intel.com>
>>>>> Signed-off-by: Zhang Chen <chen.zhang@intel.com>
>>>>> ---
>>>>>     net/colo-compare.c    | 2 +-
>>>>>     net/filter-rewriter.c | 2 +-
>>>>>     2 files changed, 2 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/net/colo-compare.c b/net/colo-compare.c index
>>>>> 62554b5b3c..ab054cfd21 100644
>>>>> --- a/net/colo-compare.c
>>>>> +++ b/net/colo-compare.c
>>>>> @@ -1324,7 +1324,7 @@ static void
>>>> colo_compare_complete(UserCreatable *uc, Error **errp)
>>>>>         s->connection_track_table =
>>>> g_hash_table_new_full(connection_key_hash,
>>>>>                                                           connection_key_equal,
>>>>>                                                           g_free,
>>>>> -                                                      connection_destroy);
>>>>> +                                                      NULL);
>>>> 202 /* if not found, create a new connection and add to hash table */
>>>> 203 Connection *connection_get(GHashTable *connection_track_table,
>>>> 204                            ConnectionKey *key,
>>>> 205                            GQueue *conn_list)
>>>> 206 {
>>>> 207     Connection *conn =
>>>> g_hash_table_lookup(connection_track_table,
>>>> key);
>>>> 208
>>>> 209     if (conn == NULL) {
>>>> 210         ConnectionKey *new_key = g_memdup(key, sizeof(*key));
>>>> 211
>>>> 212         conn = connection_new(key);
>>>> 213
>>>> 214         if (g_hash_table_size(connection_track_table) >
>>>> HASHTABLE_MAX_SIZE) {
>>>> 215             trace_colo_proxy_main("colo proxy connection hashtable full,"
>>>> 216                                   " clear it");
>>>> 217 connection_hashtable_reset(connection_track_table);
>>>>
>>>> 197 void connection_hashtable_reset(GHashTable
>>>> *connection_track_table)
>>>> 198 {
>>>> 199 g_hash_table_remove_all(connection_track_table);
>>>> 200 }
>>>>
>>>> IIUC,  above subroutine will do some cleanup explicitly. And before
>>>> your patch, connection_hashtable_reset() will release all keys and
>>>> their values in this hashtable. But now, you remove all keys and just
>>>> one value(conn_list) instead. Does it means other values will be leaked ?
>>> Thanks Zhijian. Re-think about it. Yes, I think you are right.
>>> It looks need a lock to avoid into connection_get() when triggered the clear
>> hashtable operation.
>>> What do you think?
>>>
>>> Thanks
>>> Chen
>>>
>>>
>>>> 218 /*
>>>> 219              * clear the conn_list
>>>> 220 */
>>>> 221             while (!g_queue_is_empty(conn_list)) {
>>>> 222 connection_destroy(g_queue_pop_head(conn_list));
>>>> 223 }
>>>> 224 }
>>>> 225
>>>> 226         g_hash_table_insert(connection_track_table, new_key,
>>>> conn);
>>>> 227 }
>>>> 228
>>>> 229     return conn;
>>>> 230 }
>>>>
>>>>
>>>> Thanks
>>>> Zhijian
>>>>
>>>>>         colo_compare_iothread(s);
>>>>>
>>>>> diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c index
>>>>> bf05023dc3..c18c4c2019 100644
>>>>> --- a/net/filter-rewriter.c
>>>>> +++ b/net/filter-rewriter.c
>>>>> @@ -383,7 +383,7 @@ static void colo_rewriter_setup(NetFilterState
>>>>> *nf,
>>>> Error **errp)
>>>>>         s->connection_track_table =
>>>> g_hash_table_new_full(connection_key_hash,
>>>>>                                                           connection_key_equal,
>>>>>                                                           g_free,
>>>>> -                                                      connection_destroy);
>>>>> +                                                      NULL);
>>>>>         s->incoming_queue =
>>>> qemu_new_net_queue(qemu_netfilter_pass_to_next, nf);
>>>>>     }
>>>>>
Zhang Chen April 1, 2022, 3:33 a.m. UTC | #6
> -----Original Message-----
> From: lizhijian@fujitsu.com <lizhijian@fujitsu.com>
> Sent: Friday, April 1, 2022 9:47 AM
> To: Zhang, Chen <chen.zhang@intel.com>; Jason Wang
> <jasowang@redhat.com>
> Cc: qemu-dev <qemu-devel@nongnu.org>; Like Xu <like.xu@linux.intel.com>
> Subject: Re: [PATCH 2/4] net/colo: Fix a "double free" crash to clear the
> conn_list
> 
> 
> 
> On 31/03/2022 10:25, Zhang, Chen wrote:
> >
> >> -----Original Message-----
> >> From: lizhijian@fujitsu.com <lizhijian@fujitsu.com>
> >> Sent: Thursday, March 31, 2022 9:15 AM
> >> To: Zhang, Chen <chen.zhang@intel.com>; Jason Wang
> >> <jasowang@redhat.com>
> >> Cc: qemu-dev <qemu-devel@nongnu.org>; Like Xu
> >> <like.xu@linux.intel.com>
> >> Subject: Re: [PATCH 2/4] net/colo: Fix a "double free" crash to clear
> >> the conn_list
> >>
> >>
> >> connection_track_table
> >> -----+----------
> >> key1 | conn    |-----------------------------------------------------------+
> >> -----+----------                                                           |
> >> key2 | conn    |----------------------------------+                        |
> >> -----+----------                                  |                        |
> >> key3 | conn    |---------+                        |                        |
> >> -----+----------         |                        |                        |
> >>                            |                        |                        |
> >>                            |                        |                        |
> >>       + CompareState ++    |                        |                        |
> >>       |               |    V                        V                        V
> >>       +---------------+   +---------------+         +---------------+
> >>       |conn_list      +--->conn           +--------->conn           |     connx
> >>       +---------------+   +---------------+         +---------------+
> >>       |               |     |           |             |          |
> >>       +---------------+ +---v----+  +---v----+    +---v----+ +---v----+
> >>                         |primary |  |secondary    |primary | |secondary
> >>                         |packet  |  |packet  +    |packet  | |packet  +
> >>                         +--------+  +--------+    +--------+ +--------+
> >>                             |           |             |          |
> >>                         +---v----+  +---v----+    +---v----+ +---v----+
> >>                         |primary |  |secondary    |primary | |secondary
> >>                         |packet  |  |packet  +    |packet  | |packet  +
> >>                         +--------+  +--------+    +--------+ +--------+
> >>                             |           |             |          |
> >>                         +---v----+  +---v----+    +---v----+ +---v----+
> >>                         |primary |  |secondary    |primary | |secondary
> >>                         |packet  |  |packet  +    |packet  | |packet  +
> >>                         +--------+  +--------+    +--------+ +--------+
> >>
> >> I recalled that we should above relationships between
> >> connection_track_table conn_list and conn.
> >> That means both connection_track_table and conn_list reference to the
> >> same conn instance.
> >>
> >> So before this patch, connection_get() is possible to
> >> use-after-free/double free conn. where 1st was in
> >> connection_hashtable_reset() and 2nd was
> >> 221             while (!g_queue_is_empty(conn_list)) {
> >> 222                 connection_destroy(g_queue_pop_head(conn_list));
> >> 223             }
> >>
> >> I also doubt that your current abort was just due to above use-after-
> >> free/double free.
> >> If so, looks it's enough we just update to g_queue_clear(conn_list)
> >> in the 2nd place.
> > Make sense, but It also means the original patch works here, skip free conn
> in connection_hashtable_reset() and do it in:
> > 221             while (!g_queue_is_empty(conn_list)) {
> >   222                 connection_destroy(g_queue_pop_head(conn_list));
> >   223             }.
> > It also avoid use-after-free/double free conn.
> Although you will not use-after-free here, you have to consider other
> situations carefully that
> g_hash_table_remove_all() g_hash_table_destroy() were called where the
> conn_list should also be freed with you approach.
> 
> 

I re-checked the code, it looks fine to me.

> 
> 
> > Maybe we can keep the original version to fix it?
> And your commit log should be more clear.

OK, I will update V2 for commit log.

Thanks
Chen 

> 
> Thanks
> Zhijian
> 
> >
> > Thanks
> > Chen
> >
> >> Thanks
> >> Zhijian
> >>
> >>
> >> On 28/03/2022 17:13, Zhang, Chen wrote:
> >>>> -----Original Message-----
> >>>> From: lizhijian@fujitsu.com <lizhijian@fujitsu.com>
> >>>> Sent: Monday, March 21, 2022 11:06 AM
> >>>> To: Zhang, Chen <chen.zhang@intel.com>; Jason Wang
> >>>> <jasowang@redhat.com>; lizhijian@fujitsu.com
> >>>> Cc: qemu-dev <qemu-devel@nongnu.org>; Like Xu
> >>>> <like.xu@linux.intel.com>
> >>>> Subject: Re: [PATCH 2/4] net/colo: Fix a "double free" crash to
> >>>> clear the conn_list
> >>>>
> >>>>
> >>>>
> >>>> On 09/03/2022 16:38, Zhang Chen wrote:
> >>>>> We notice the QEMU may crash when the guest has too many
> incoming
> >>>>> network connections with the following log:
> >>>>>
> >>>>> 15197@1593578622.668573:colo_proxy_main : colo proxy connection
> >>>>> hashtable full, clear it
> >>>>> free(): invalid pointer
> >>>>> [1]    15195 abort (core dumped)  qemu-system-x86_64 ....
> >>>>>
> >>>>> This is because we create the s->connection_track_table with
> >>>>> g_hash_table_new_full() which is defined as:
> >>>>>
> >>>>> GHashTable * g_hash_table_new_full (GHashFunc hash_func,
> >>>>>                           GEqualFunc key_equal_func,
> >>>>>                           GDestroyNotify key_destroy_func,
> >>>>>                           GDestroyNotify value_destroy_func);
> >>>>>
> >>>>> The fourth parameter connection_destroy() will be called to free
> >>>>> the memory allocated for all 'Connection' values in the hashtable
> >>>>> when we call g_hash_table_remove_all() in the
> connection_hashtable_reset().
> >>>>>
> >>>>> It's unnecessary because we clear the conn_list explicitly later,
> >>>>> and it's buggy when other agents try to call connection_get() with
> >>>>> the same connection_track_table.
> >>>>>
> >>>>> Signed-off-by: Like Xu <like.xu@linux.intel.com>
> >>>>> Signed-off-by: Zhang Chen <chen.zhang@intel.com>
> >>>>> ---
> >>>>>     net/colo-compare.c    | 2 +-
> >>>>>     net/filter-rewriter.c | 2 +-
> >>>>>     2 files changed, 2 insertions(+), 2 deletions(-)
> >>>>>
> >>>>> diff --git a/net/colo-compare.c b/net/colo-compare.c index
> >>>>> 62554b5b3c..ab054cfd21 100644
> >>>>> --- a/net/colo-compare.c
> >>>>> +++ b/net/colo-compare.c
> >>>>> @@ -1324,7 +1324,7 @@ static void
> >>>> colo_compare_complete(UserCreatable *uc, Error **errp)
> >>>>>         s->connection_track_table =
> >>>> g_hash_table_new_full(connection_key_hash,
> >>>>>                                                           connection_key_equal,
> >>>>>                                                           g_free,
> >>>>> -                                                      connection_destroy);
> >>>>> +                                                      NULL);
> >>>> 202 /* if not found, create a new connection and add to hash table
> >>>> */
> >>>> 203 Connection *connection_get(GHashTable *connection_track_table,
> >>>> 204                            ConnectionKey *key,
> >>>> 205                            GQueue *conn_list)
> >>>> 206 {
> >>>> 207     Connection *conn =
> >>>> g_hash_table_lookup(connection_track_table,
> >>>> key);
> >>>> 208
> >>>> 209     if (conn == NULL) {
> >>>> 210         ConnectionKey *new_key = g_memdup(key, sizeof(*key));
> >>>> 211
> >>>> 212         conn = connection_new(key);
> >>>> 213
> >>>> 214         if (g_hash_table_size(connection_track_table) >
> >>>> HASHTABLE_MAX_SIZE) {
> >>>> 215             trace_colo_proxy_main("colo proxy connection hashtable
> full,"
> >>>> 216                                   " clear it");
> >>>> 217 connection_hashtable_reset(connection_track_table);
> >>>>
> >>>> 197 void connection_hashtable_reset(GHashTable
> >>>> *connection_track_table)
> >>>> 198 {
> >>>> 199 g_hash_table_remove_all(connection_track_table);
> >>>> 200 }
> >>>>
> >>>> IIUC,  above subroutine will do some cleanup explicitly. And before
> >>>> your patch, connection_hashtable_reset() will release all keys and
> >>>> their values in this hashtable. But now, you remove all keys and
> >>>> just one value(conn_list) instead. Does it means other values will be
> leaked ?
> >>> Thanks Zhijian. Re-think about it. Yes, I think you are right.
> >>> It looks need a lock to avoid into connection_get() when triggered
> >>> the clear
> >> hashtable operation.
> >>> What do you think?
> >>>
> >>> Thanks
> >>> Chen
> >>>
> >>>
> >>>> 218 /*
> >>>> 219              * clear the conn_list
> >>>> 220 */
> >>>> 221             while (!g_queue_is_empty(conn_list)) {
> >>>> 222 connection_destroy(g_queue_pop_head(conn_list));
> >>>> 223 }
> >>>> 224 }
> >>>> 225
> >>>> 226         g_hash_table_insert(connection_track_table, new_key,
> >>>> conn);
> >>>> 227 }
> >>>> 228
> >>>> 229     return conn;
> >>>> 230 }
> >>>>
> >>>>
> >>>> Thanks
> >>>> Zhijian
> >>>>
> >>>>>         colo_compare_iothread(s);
> >>>>>
> >>>>> diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c index
> >>>>> bf05023dc3..c18c4c2019 100644
> >>>>> --- a/net/filter-rewriter.c
> >>>>> +++ b/net/filter-rewriter.c
> >>>>> @@ -383,7 +383,7 @@ static void colo_rewriter_setup(NetFilterState
> >>>>> *nf,
> >>>> Error **errp)
> >>>>>         s->connection_track_table =
> >>>> g_hash_table_new_full(connection_key_hash,
> >>>>>                                                           connection_key_equal,
> >>>>>                                                           g_free,
> >>>>> -                                                      connection_destroy);
> >>>>> +                                                      NULL);
> >>>>>         s->incoming_queue =
> >>>> qemu_new_net_queue(qemu_netfilter_pass_to_next, nf);
> >>>>>     }
> >>>>>
diff mbox series

Patch

diff --git a/net/colo-compare.c b/net/colo-compare.c
index 62554b5b3c..ab054cfd21 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -1324,7 +1324,7 @@  static void colo_compare_complete(UserCreatable *uc, Error **errp)
     s->connection_track_table = g_hash_table_new_full(connection_key_hash,
                                                       connection_key_equal,
                                                       g_free,
-                                                      connection_destroy);
+                                                      NULL);
 
     colo_compare_iothread(s);
 
diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c
index bf05023dc3..c18c4c2019 100644
--- a/net/filter-rewriter.c
+++ b/net/filter-rewriter.c
@@ -383,7 +383,7 @@  static void colo_rewriter_setup(NetFilterState *nf, Error **errp)
     s->connection_track_table = g_hash_table_new_full(connection_key_hash,
                                                       connection_key_equal,
                                                       g_free,
-                                                      connection_destroy);
+                                                      NULL);
     s->incoming_queue = qemu_new_net_queue(qemu_netfilter_pass_to_next, nf);
 }