From patchwork Thu Sep 13 04:52:50 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhang Chen X-Patchwork-Id: 10598685 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6E90213B6 for ; Thu, 13 Sep 2018 04:54:38 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 596912A2F7 for ; Thu, 13 Sep 2018 04:54:38 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 49CE92A63E; Thu, 13 Sep 2018 04:54:38 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.8 required=2.0 tests=BAYES_00,DKIM_ADSP_CUSTOM_MED, DKIM_SIGNED,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 6103C2A2F7 for ; Thu, 13 Sep 2018 04:54:37 +0000 (UTC) Received: from localhost ([::1]:40428 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1g0JeG-0006dy-4V for patchwork-qemu-devel@patchwork.kernel.org; Thu, 13 Sep 2018 00:54:36 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:52440) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1g0Jcr-0005uu-3U for qemu-devel@nongnu.org; Thu, 13 Sep 2018 00:53:10 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1g0Jcm-0004fM-Hq for qemu-devel@nongnu.org; Thu, 13 Sep 2018 00:53:09 -0400 Received: from mail-pl1-x641.google.com ([2607:f8b0:4864:20::641]:32886) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1g0Jcm-0004eD-9d for qemu-devel@nongnu.org; Thu, 13 Sep 2018 00:53:04 -0400 Received: by mail-pl1-x641.google.com with SMTP id b30-v6so2048493pla.0 for ; Wed, 12 Sep 2018 21:53:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id; bh=EgKpYGx01ycLLaBKjQ1rU7OJWRCsNz1qhtGEaO3RJ0Q=; b=p4NjAE+pnVCFeun6yXSFForBPVFIM1St5GD7BxACieepsv270eMqQKxx2KG7SEiC59 bqbApdfeNkPWM1i+Yei5uz11GHyPEt0P/nemu4GDS/vi9GN8zHzqNPSopei09yDPbvM0 6Rl54fhPBbI9VY+AgBB0RmZFhU5e/pG92WhyYwep3ac81Y2qmncZkdeh9qzI8IliOd2C ck1bQYKcOI6sN0mjoAm1yDFovIwQT0VAkMf+3TmtLP2ip7lel/+hIziqIZDy/Trp6bv3 3LU0z1S82Yn7rsFGhQKKuyeAbIm4dcGqQ/jjaNrGCKMZCqBvE2G5554gmACR4Ja1RChi Angw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=EgKpYGx01ycLLaBKjQ1rU7OJWRCsNz1qhtGEaO3RJ0Q=; b=AL9KPY+ocESMUwm8VQViktBO/QNO4YBS+Hb+ESWRcN33I0MQWGMk885lWJPTlw4DEk iApJOYCoQDO06YJPrssLwMXl6lAP8esjORxUNLn7NUe/DsAixqxq/pTAXOuWhKdX1exC CVYAh8nFdR/bfoJWZyLoO+hN9MVoQHoqiiycOsHHnIi3Hos3D0OIHdmNtXMsmU7vknqa 7U9IvzaIL7vWTdDT9nV/pgs4CeM72vJKWv5sUpWiYBgJ5LpWZPoIBifwKTxgeKHgBjuz PnH3N5iKZCDAUv5f1nN7EdRUwdsC0eY0hanzmsMqty86fcu+MZ/pr+nY7ongNynnjqhA FtYw== X-Gm-Message-State: APzg51CQwzioOJ2cYEFG91maJfyvGWVFBTANbfGuvH1EezSE9ZhEJjuo kuLIAtBkb3xSITuL1VEKlB2pmXHI X-Google-Smtp-Source: ANB0VdaQhbglt6ZrtH+YGA2VFv5c8zncbmYYSbzrs15PJrJNhucmY3T1yBGxymkAC+J7vNnUFeltPw== X-Received: by 2002:a17:902:7287:: with SMTP id d7-v6mr5524835pll.54.1536814381831; Wed, 12 Sep 2018 21:53:01 -0700 (PDT) Received: from zhangc-ss-3.c.lucid-bebop-198320.internal (155.150.229.35.bc.googleusercontent.com. [35.229.150.155]) by smtp.gmail.com with ESMTPSA id q25-v6sm3801398pfh.113.2018.09.12.21.52.58 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Wed, 12 Sep 2018 21:53:00 -0700 (PDT) From: Zhang Chen To: qemu-devel@nongnu.org, Paolo Bonzini , Juan Quintela , "Dr . David Alan Gilbert" , Jason Wang , Eric Blake , Markus Armbruster Date: Thu, 13 Sep 2018 04:52:50 +0000 Message-Id: <1536814370-12997-1-git-send-email-zhangckid@gmail.com> X-Mailer: git-send-email 2.7.4 X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:4864:20::641 Subject: [Qemu-devel] [PATCH V12-fix 01/19] filter-rewriter: Add TCP state machine and fix memory leak in connection_track_table X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Zhang Chen , zhanghailiang , Li Zhijian , Zhang Chen Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" X-Virus-Scanned: ClamAV using ClamSMTP We add almost full TCP state machine in filter-rewriter, except TCPS_LISTEN and some simplify in VM active close FIN states. The reason for this simplify job is because guest kernel will track the TCP status and wait 2MSL time too, if client resend the FIN packet, guest will resend the last ACK, so we needn't wait 2MSL time in filter-rewriter. After a net connection is closed, we didn't clear its releated resources in connection_track_table, which will lead to memory leak. Let't track the state of net connection, if it is closed, its related resources will be cleared up. Signed-off-by: zhanghailiang Signed-off-by: Zhang Chen Signed-off-by: Zhang Chen --- net/colo.c | 2 +- net/colo.h | 9 ++-- net/filter-rewriter.c | 109 ++++++++++++++++++++++++++++++++++++++---- 3 files changed, 104 insertions(+), 16 deletions(-) diff --git a/net/colo.c b/net/colo.c index 6dda4ed66e..97c8fc928f 100644 --- a/net/colo.c +++ b/net/colo.c @@ -137,7 +137,7 @@ Connection *connection_new(ConnectionKey *key) conn->ip_proto = key->ip_proto; conn->processing = false; conn->offset = 0; - conn->syn_flag = 0; + conn->tcp_state = TCPS_CLOSED; conn->pack = 0; conn->sack = 0; g_queue_init(&conn->primary_list); diff --git a/net/colo.h b/net/colo.h index da6c36dcf7..0277e0e9ba 100644 --- a/net/colo.h +++ b/net/colo.h @@ -18,6 +18,7 @@ #include "slirp/slirp.h" #include "qemu/jhash.h" #include "qemu/timer.h" +#include "slirp/tcp.h" #define HASHTABLE_MAX_SIZE 16384 @@ -81,11 +82,9 @@ typedef struct Connection { uint32_t sack; /* offset = secondary_seq - primary_seq */ tcp_seq offset; - /* - * we use this flag update offset func - * run once in independent tcp connection - */ - int syn_flag; + + int tcp_state; /* TCP FSM state */ + tcp_seq fin_ack_seq; /* the seq of 'fin=1,ack=1' */ } Connection; uint32_t connection_key_hash(const void *opaque); diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c index f584e4eba4..dd323faf61 100644 --- a/net/filter-rewriter.c +++ b/net/filter-rewriter.c @@ -59,9 +59,9 @@ static int is_tcp_packet(Packet *pkt) } /* handle tcp packet from primary guest */ -static int handle_primary_tcp_pkt(NetFilterState *nf, +static int handle_primary_tcp_pkt(RewriterState *rf, Connection *conn, - Packet *pkt) + Packet *pkt, ConnectionKey *key) { struct tcphdr *tcp_pkt; @@ -74,23 +74,28 @@ static int handle_primary_tcp_pkt(NetFilterState *nf, trace_colo_filter_rewriter_conn_offset(conn->offset); } + if (((tcp_pkt->th_flags & (TH_ACK | TH_SYN)) == (TH_ACK | TH_SYN)) && + conn->tcp_state == TCPS_SYN_SENT) { + conn->tcp_state = TCPS_ESTABLISHED; + } + if (((tcp_pkt->th_flags & (TH_ACK | TH_SYN)) == TH_SYN)) { /* * we use this flag update offset func * run once in independent tcp connection */ - conn->syn_flag = 1; + conn->tcp_state = TCPS_SYN_RECEIVED; } if (((tcp_pkt->th_flags & (TH_ACK | TH_SYN)) == TH_ACK)) { - if (conn->syn_flag) { + if (conn->tcp_state == TCPS_SYN_RECEIVED) { /* * offset = secondary_seq - primary seq * ack packet sent by guest from primary node, * so we use th_ack - 1 get primary_seq */ conn->offset -= (ntohl(tcp_pkt->th_ack) - 1); - conn->syn_flag = 0; + conn->tcp_state = TCPS_ESTABLISHED; } if (conn->offset) { /* handle packets to the secondary from the primary */ @@ -99,15 +104,66 @@ static int handle_primary_tcp_pkt(NetFilterState *nf, net_checksum_calculate((uint8_t *)pkt->data + pkt->vnet_hdr_len, pkt->size - pkt->vnet_hdr_len); } + + /* + * Passive close step 3 + */ + if ((conn->tcp_state == TCPS_LAST_ACK) && + (ntohl(tcp_pkt->th_ack) == (conn->fin_ack_seq + 1))) { + conn->tcp_state = TCPS_CLOSED; + g_hash_table_remove(rf->connection_track_table, key); + } + } + + if ((tcp_pkt->th_flags & TH_FIN) == TH_FIN) { + /* + * Passive close. + * Step 1: + * The *server* side of this connect is VM, *client* tries to close + * the connection. We will into CLOSE_WAIT status. + * + * Step 2: + * In this step we will into LAST_ACK status. + * + * We got 'fin=1, ack=1' packet from server side, we need to + * record the seq of 'fin=1, ack=1' packet. + * + * Step 3: + * We got 'ack=1' packets from client side, it acks 'fin=1, ack=1' + * packet from server side. From this point, we can ensure that there + * will be no packets in the connection, except that, some errors + * happen between the path of 'filter object' and vNIC, if this rare + * case really happen, we can still create a new connection, + * So it is safe to remove the connection from connection_track_table. + * + */ + if (conn->tcp_state == TCPS_ESTABLISHED) { + conn->tcp_state = TCPS_CLOSE_WAIT; + } + + /* + * Active close step 2. + */ + if (conn->tcp_state == TCPS_FIN_WAIT_1) { + conn->tcp_state = TCPS_TIME_WAIT; + /* + * For simplify implementation, we needn't wait 2MSL time + * in filter rewriter. Because guest kernel will track the + * TCP status and wait 2MSL time, if client resend the FIN + * packet, guest will apply the last ACK too. + */ + conn->tcp_state = TCPS_CLOSED; + g_hash_table_remove(rf->connection_track_table, key); + } } return 0; } /* handle tcp packet from secondary guest */ -static int handle_secondary_tcp_pkt(NetFilterState *nf, +static int handle_secondary_tcp_pkt(RewriterState *rf, Connection *conn, - Packet *pkt) + Packet *pkt, ConnectionKey *key) { struct tcphdr *tcp_pkt; @@ -121,7 +177,8 @@ static int handle_secondary_tcp_pkt(NetFilterState *nf, trace_colo_filter_rewriter_conn_offset(conn->offset); } - if (((tcp_pkt->th_flags & (TH_ACK | TH_SYN)) == (TH_ACK | TH_SYN))) { + if (conn->tcp_state == TCPS_SYN_RECEIVED && + ((tcp_pkt->th_flags & (TH_ACK | TH_SYN)) == (TH_ACK | TH_SYN))) { /* * save offset = secondary_seq and then * in handle_primary_tcp_pkt make offset @@ -130,6 +187,12 @@ static int handle_secondary_tcp_pkt(NetFilterState *nf, conn->offset = ntohl(tcp_pkt->th_seq); } + /* VM active connect */ + if (conn->tcp_state == TCPS_CLOSED && + ((tcp_pkt->th_flags & (TH_ACK | TH_SYN)) == TH_SYN)) { + conn->tcp_state = TCPS_SYN_SENT; + } + if ((tcp_pkt->th_flags & (TH_ACK | TH_SYN)) == TH_ACK) { /* Only need to adjust seq while offset is Non-zero */ if (conn->offset) { @@ -141,6 +204,32 @@ static int handle_secondary_tcp_pkt(NetFilterState *nf, } } + /* + * Passive close step 2: + */ + if (conn->tcp_state == TCPS_CLOSE_WAIT && + (tcp_pkt->th_flags & (TH_ACK | TH_FIN)) == (TH_ACK | TH_FIN)) { + conn->fin_ack_seq = ntohl(tcp_pkt->th_seq); + conn->tcp_state = TCPS_LAST_ACK; + } + + /* + * Active close + * + * Step 1: + * The *server* side of this connect is VM, *server* tries to close + * the connection. + * + * Step 2: + * We will into CLOSE_WAIT status. + * We simplify the TCPS_FIN_WAIT_2, TCPS_TIME_WAIT and + * CLOSING status. + */ + if (conn->tcp_state == TCPS_ESTABLISHED && + (tcp_pkt->th_flags & (TH_ACK | TH_FIN)) == TH_FIN) { + conn->tcp_state = TCPS_FIN_WAIT_1; + } + return 0; } @@ -190,7 +279,7 @@ static ssize_t colo_rewriter_receive_iov(NetFilterState *nf, if (sender == nf->netdev) { /* NET_FILTER_DIRECTION_TX */ - if (!handle_primary_tcp_pkt(nf, conn, pkt)) { + if (!handle_primary_tcp_pkt(s, conn, pkt, &key)) { qemu_net_queue_send(s->incoming_queue, sender, 0, (const uint8_t *)pkt->data, pkt->size, NULL); packet_destroy(pkt, NULL); @@ -203,7 +292,7 @@ static ssize_t colo_rewriter_receive_iov(NetFilterState *nf, } } else { /* NET_FILTER_DIRECTION_RX */ - if (!handle_secondary_tcp_pkt(nf, conn, pkt)) { + if (!handle_secondary_tcp_pkt(s, conn, pkt, &key)) { qemu_net_queue_send(s->incoming_queue, sender, 0, (const uint8_t *)pkt->data, pkt->size, NULL); packet_destroy(pkt, NULL);