From patchwork Tue Mar 9 17:07:22 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shay Agroskin X-Patchwork-Id: 12126161 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-14.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B5B76C433E9 for ; Tue, 9 Mar 2021 17:08:43 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 88EC96523B for ; Tue, 9 Mar 2021 17:08:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231319AbhCIRIM (ORCPT ); Tue, 9 Mar 2021 12:08:12 -0500 Received: from smtp-fw-9103.amazon.com ([207.171.188.200]:21050 "EHLO smtp-fw-9103.amazon.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231536AbhCIRIJ (ORCPT ); Tue, 9 Mar 2021 12:08:09 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1615309688; x=1646845688; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=haFn1f73pgKjgHf3KT43PZsgHzOafDlPYA0vouf0DHE=; b=aUkCgQu2VdYjwYkTrbPuiCUKTEaLK4cajGQWKbEyFZHmzhg/DtjFvKzF a5j5/1QthOw2TVR4oX7Yrjb/uEqC1s0pxqHqa++lMITM4yFcc3XH2ycBg vdBIODH9dq1H//Wbaldg9+phrtsyzbXnZiIJDgrSP5SF0ekoMy6/aGTHq Y=; X-IronPort-AV: E=Sophos;i="5.81,236,1610409600"; d="scan'208";a="917792185" Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO email-inbound-relay-1e-57e1d233.us-east-1.amazon.com) ([10.43.8.6]) by smtp-border-fw-out-9103.sea19.amazon.com with ESMTP; 09 Mar 2021 17:08:01 +0000 Received: from EX13D28EUB001.ant.amazon.com (iad12-ws-svc-p26-lb9-vlan3.iad.amazon.com [10.40.163.38]) by email-inbound-relay-1e-57e1d233.us-east-1.amazon.com (Postfix) with ESMTPS id BB8BC1431AC; Tue, 9 Mar 2021 17:08:00 +0000 (UTC) Received: from u570694869fb251.ant.amazon.com (10.43.161.39) by EX13D28EUB001.ant.amazon.com (10.43.166.50) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Tue, 9 Mar 2021 17:07:51 +0000 From: Shay Agroskin To: David Miller , Jakub Kicinski , CC: Shay Agroskin , "Woodhouse, David" , "Machulsky, Zorik" , "Matushevsky, Alexander" , Saeed Bshara , "Wilson, Matt" , "Liguori, Anthony" , "Bshara, Nafea" , "Tzalik, Guy" , "Belgazal, Netanel" , "Saidi, Ali" , "Herrenschmidt, Benjamin" , "Kiyanovski, Arthur" , "Jubran, Samih" , "Dagan, Noam" Subject: [RFC Patch v1 0/3] Introduce ENA local page cache Date: Tue, 9 Mar 2021 19:07:22 +0200 Message-ID: <20210309170725.2187138-1-shayagr@amazon.com> X-Mailer: git-send-email 2.30.1 MIME-Version: 1.0 X-Originating-IP: [10.43.161.39] X-ClientProxiedBy: EX13D17UWC001.ant.amazon.com (10.43.162.188) To EX13D28EUB001.ant.amazon.com (10.43.166.50) Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-State: RFC High incoming pps rate leads to frequent memory allocations by the napi routine to refill the pages of the incoming packets. On several new instances in AWS fleet, with high pps rates, these frequent allocations create a contention between the different napi routines. The contention happens because the routines end up accessing the buddy allocator which is a shared resource and requires lock-based synchronization (also, when freeing a page the same lock is held). In our tests we observed that that this contention causes the CPUs that serve the RX queues to reach 100% and damage network performance. While this contention can be relieved by making sure that pages are allocated and freed on the same core, which would allow the driver to take advantage of PCP, this solution is not always available or easy to maintain. This patchset implements a page cache local to each RX queue. When the napi routine allocates a page, it first checks whether the cache has a previously allocated page that isn't used. If so, this page is fetched instead of allocating a new one. Otherwise, if the cache is out of free pages, a page is allocated using normal allocation path (PCP or buddy allocator) and returned to the caller. A page that is allocated outside the cache, is afterwards cached, up to cache's maximum size (set to 2048 pages in this patchset). The pages' availability is tracked by checking their refcount. A cached page has a refcount of 2 when it is passed to the napi routine as an RX buffer. When a refcount of a page reaches 1, the cache assumes that it is free to be re-used. To avoid traversing all pages in the cache when looking for an available page, we only check the availability of the first page fetched for the RX queue that wasn't returned to the cache yet (i.e. has a refcount of more than 1). For example, for a cache of size 8 from which pages at indices 0-7 were fetched (and placed in the RX SQ), the next time napi would try to fetch a page from the cache, the cache would check the availability of the page at index 0, and if it is available, this page would be fetched for the napi. The next time napi would try to fetch a page, the cache entry at index 1 would be checked, and so on. Memory consumption: In its maximum occupancy the cache would hold 2048 pages per each queue. Providing an interface with 32 queues, 32 * 2048 * 4K = 64MB are being used by the driver for its RX queues. To avoid choking the system, this feature is only enabled for instances with more than 16 queues which in AWS come with several tens Gigs of RAM. Moreover, the feature can be turned off completely using ethtool. Having said that, the memory cost of having RX queues with 2K entries would be the same as with 1K entries queue + LPC in worst case, while the latter allocates the memory only in case the traffic rate is higher than the rate of the pages being freed. Performance results: 4 c5n.18xlarge instances sending iperf TCP traffic to a p4d.24xlarge instance. Packet size: 1500 bytes c5n.18xlarge specs: Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz with 72 cores. 32 queue pairs. p4d.24xlarge specs: Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz with 96 cores. 4 * 32 = 128 (4 interfaces) queue pairs. | | before | after | | + + | | bandwidth (Gbps) | 260 | 330 | | CPU utilization (%) | 100 | 56 | Shay Agroskin (3): net: ena: implement local page cache (LPC) system net: ena: update README file with a description about LPC net: ena: support ethtool priv-flags and LPC state change .../device_drivers/ethernet/amazon/ena.rst | 28 ++ drivers/net/ethernet/amazon/ena/ena_ethtool.c | 56 ++- drivers/net/ethernet/amazon/ena/ena_netdev.c | 369 +++++++++++++++++- drivers/net/ethernet/amazon/ena/ena_netdev.h | 32 ++ 4 files changed, 458 insertions(+), 27 deletions(-)