From patchwork Mon May 4 15:40:35 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: "Das, Nirmoy" X-Patchwork-Id: 11526571 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 66FA892A for ; Mon, 4 May 2020 15:40:57 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 44633206D7 for ; Mon, 4 May 2020 15:40:57 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=amdcloud.onmicrosoft.com header.i=@amdcloud.onmicrosoft.com header.b="JMDmEZfn" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 44633206D7 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=amd.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=dri-devel-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 5ED9E89F38; Mon, 4 May 2020 15:40:53 +0000 (UTC) X-Original-To: dri-devel@lists.freedesktop.org Delivered-To: dri-devel@lists.freedesktop.org Received: from NAM10-BN7-obe.outbound.protection.outlook.com (mail-bn7nam10on2085.outbound.protection.outlook.com [40.107.92.85]) by gabe.freedesktop.org (Postfix) with ESMTPS id 83DB289F38 for ; Mon, 4 May 2020 15:40:52 +0000 (UTC) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=UWeZtMp3xOF9Ak6zV383QGyjcO3NwuKhfzg9tCmp3JKFsN3cF4zmcEYTtFUAyQFFgoyOAZ8s4/ehtcXgHn3DQzxkiSmu913G89bw/R9wRVZ2k4GgsTEpvVQjGQla55Z0vScWRTSKNINvwWCpwPXTD4wa5TWVNIiP1FdKgnnah1EKZuQvvpZVHuSAZqJhZaFDWw2wpCche/x+/a4aY9KUvD1D34kTH89e9eSMmdWB1XFcji7qoL53OirT3tjdiVga36ZoSvvBU9STAKHi6AwP+PFfwgFgdh0aCvHvhBGeIZZYJtCOJJUDBMJc/jtxnp0AoCyAf0IfO5VLyfcHB1t8Yw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=EeTDzXbPyZL++WoeutcuIFsz5lW+zX3gjhOo0QaJF/U=; b=c0IZkT4xTq2JpPj3i5FnIW97Cs2XgLVefabuDoHiLcuBvGZpdg7367xPhEYFxOydy5kDMUn0pH6dfqQ64FFltIOwfP2/oxOqYCcl69vw5Z0tYK6d6rqodGvxZGP7oABSQOYR9gmulBPUlnyasJMdSQR4pHvg4HLJbGOqhiUHL5acKhXxL5q1Fvw2wV+4UJXwke11uwz/MWrbYJq6dIJKktgq+h1PDC0EoMRVrIZcVYu/0Ii+vunL2+Xe18GTayQgiKQBVamElefvTu9Zc1cTEc1vmStMVAF3AgdK+0Jzu6+bwD3UvCtMsVXsN27p5604RJPGDsSY9yWfFjQys2LOlg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=amd.com; dmarc=pass action=none header.from=amd.com; dkim=pass header.d=amd.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amdcloud.onmicrosoft.com; s=selector2-amdcloud-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=EeTDzXbPyZL++WoeutcuIFsz5lW+zX3gjhOo0QaJF/U=; b=JMDmEZfnCN6j7bk8+9GL00lYcWoOY+Ret+avoqisVrVWyM5zm5QyJHtMelEAwFxATQzw395Uuors+5f3mTQC7NPghqzWT6P6oIMRTWCURiHJhgzhKaL3HX3dqk9XrKPO39Xu4VhamFEXHvP82Rk6Zb55z/Ekjg7JXZlYgz7jdOM= Authentication-Results: lists.freedesktop.org; dkim=none (message not signed) header.d=none; lists.freedesktop.org; dmarc=none action=none header.from=amd.com; Received: from MN2PR12MB3872.namprd12.prod.outlook.com (2603:10b6:208:168::17) by MN2PR12MB4502.namprd12.prod.outlook.com (2603:10b6:208:263::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2958.21; Mon, 4 May 2020 15:40:49 +0000 Received: from MN2PR12MB3872.namprd12.prod.outlook.com ([fe80::c578:193b:bda9:ac5c]) by MN2PR12MB3872.namprd12.prod.outlook.com ([fe80::c578:193b:bda9:ac5c%7]) with mapi id 15.20.2958.030; Mon, 4 May 2020 15:40:49 +0000 From: Nirmoy Das To: dri-devel@lists.freedesktop.org Subject: [PATCH v4 1/1] drm/mm: optimize rb_hole_addr rbtree search Date: Mon, 4 May 2020 17:40:35 +0200 Message-Id: <20200504154035.17795-1-nirmoy.das@amd.com> X-Mailer: git-send-email 2.26.1 X-ClientProxiedBy: BN6PR13CA0021.namprd13.prod.outlook.com (2603:10b6:404:10a::31) To MN2PR12MB3872.namprd12.prod.outlook.com (2603:10b6:208:168::17) MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 1 Received: from brihaspati.amd.com (2003:c5:8f23:4600:361c:5d25:fe6c:c56f) by BN6PR13CA0021.namprd13.prod.outlook.com (2603:10b6:404:10a::31) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2979.16 via Frontend Transport; Mon, 4 May 2020 15:40:48 +0000 X-Mailer: git-send-email 2.26.1 X-Originating-IP: [2003:c5:8f23:4600:361c:5d25:fe6c:c56f] X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-HT: Tenant X-MS-Office365-Filtering-Correlation-Id: cdd4ea83-6301-400e-a759-08d7f041850a X-MS-TrafficTypeDiagnostic: MN2PR12MB4502:|MN2PR12MB4502: X-MS-Exchange-Transport-Forked: True X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:8273; X-Forefront-PRVS: 03932714EB X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: sSHKdGcGGrR6SepY7lKTL9+SWX1ULURJfK3dbuyd9wdkTOhCcpMrFkdW20bvEytbOttzOQ18S5zktKxCLMwxaSq+ZQavYiY0kXTx1yRaep3gxRVyK4mSbz3c8Huv+IjQbk7p4kfsdYCgFulG0u9Ze6ozeJp0Jf40ezKCcZp1Z+ESA6Kx0S0OVeTk35x5R1c2RsHmbH3GPUmZeQ4cEpRwLTonrD393MQrqYpW9TyLNu79s9f3RwEAzNz3HMkKtXwOM3n2kFpBYej32nH+Xp2cTlwwX1i3ah2alsxZAoj3IM3lM7mTj5S3ytA03E4jswI0Lm+KmokUd+f/871xukFeu2SxM2LywsuIAAXXMXvl+HTwqrJWmFc+0Ri9CNmWcCY6sAXeFa49s8nJjtkU3vlqsTzv5KXvIBks2LMCa30R9ZZ04zTEU7jSyd9eVCV5FZir X-Forefront-Antispam-Report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:MN2PR12MB3872.namprd12.prod.outlook.com; PTR:; CAT:NONE; SFTY:; SFS:(4636009)(366004)(136003)(39860400002)(346002)(376002)(396003)(4326008)(478600001)(54906003)(44832011)(66476007)(1076003)(6916009)(66946007)(86362001)(5660300002)(66556008)(66574012)(2616005)(186003)(6666004)(6486002)(36756003)(2906002)(16526019)(7696005)(52116002)(8676002)(316002)(8936002); DIR:OUT; SFP:1101; X-MS-Exchange-AntiSpam-MessageData: 3x+JlkPkIy0zICMFL5bOC+WVs5HEw/fjUczEFyZYYxcAQ4DCl0Su7sWZwzUKLo69jh3LGwcyhay48VRDAa0Cf7PTgCr5Q7vPCp/zmYe/Yk2cAfqN+bR8MkBqAaRrPA0Oh9TVzLtzUbo60o1wOPIR+YhX1liSJfzIBVMMHzvgSRdkLsMLpI0mMnN9ojNnQjRswyUXWKwI8SL4L6jfldBUJio2cSYfrV1cIaU4AjflYiAAWGC28eiNICtwloixbIcXc/Ylgn0KMvN9OPPpmlT/o2WqJ6K2iEGpSrNTV4BUxCR2aDNsjwGaC+910Z9e/TFDuHbrVpj/6waVAhll7MC7HV5vTX99ViEcwlNBr3989NrpSL7uNm75LGcSfe/TO+EceHnlWcvcsFiZMEuVBdBcbx8uYXwSX95B4UfZn0FUDZqC/c3lzn7G2KO/1JUr4wLqXkpnzaasKpVUttaW9D5hrY395q6SOsva+H4EFg0xr6c8vO28AVVSt4XTHpVP/n6Q1e8Xy5atnoerqsk2LcGV782DDBmzakH81u5HXJP0vXuMfaR9GS8fJyTNt58cxok3AAE3XtWLls7BfC6isS7WCew9W5/XJcS8JHSB4pqDxS3IctnNRzG1VyWUJI9ExprAt96mwhdA5mWdmJnoQf/mc7HHHwJaDSQNX6cjbYZBOmyVEvzSBO57ZNW7zu1QhOoa9jal8wYg079gKeHtvMgZ4jPwqLxGSYlKqPyWeodk4A9p+nHxq3Z8GLjl/3O+kvd48Sc7K8eRDMFGjHauBBcXyCvte11eb7thdVaVoS0uJSWOfqpikl8aHq2aW5Erc9bIL7Fsi9qYR/6EaA8xLxixSpBXykJp1wQTRi4Rn7A8ovY= X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-Network-Message-Id: cdd4ea83-6301-400e-a759-08d7f041850a X-MS-Exchange-CrossTenant-OriginalArrivalTime: 04 May 2020 15:40:49.5340 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: DsYkNeoqBoC4lQKf31DWHxCzrHUSSeVhVXgw881VmOkdzlT3skA9MUYnTYMJsGMztBp+kyFHdGDq8DbD6GYn8Q== X-MS-Exchange-Transport-CrossTenantHeadersStamped: MN2PR12MB4502 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kbuild test robot , nirmoy.aiemd@gmail.com, Nirmoy Das , christian.koenig@amd.com, chris@chris-wilson.co.uk Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Userspace can severely fragment rb_hole_addr rbtree by manipulating alignment while allocating buffers. Fragmented rb_hole_addr rbtree would result in large delays while allocating buffer object for a userspace application. It takes long time to find suitable hole because if we fail to find a suitable hole in the first attempt then we look for neighbouring nodes using rb_prev()/rb_next(). Traversing rbtree using rb_prev()/rb_next() can take really long time if the tree is fragmented. This patch improves searches in fragmented rb_hole_addr rbtree by modifying it to an augmented rbtree which will store an extra field in drm_mm_node, subtree_max_hole. Each drm_mm_node now stores maximum hole size for its subtree in drm_mm_node->subtree_max_hole. Using drm_mm_node->subtree_max_hole, it is possible to eliminate a complete subtree if that subtree is unable to serve a request hence reducing number of rb_prev()/rb_next() used. With this patch applied, 1 million bo allocs on amdgpu took ~8 sec, compared to 50k bo allocs which took 28 sec without it. partial test code: int test_fragmentation(void) { int i = 0; uint32_t minor_version; uint32_t major_version; struct amdgpu_bo_alloc_request request = {}; amdgpu_bo_handle vram_handle[MAX_ALLOC] = {}; amdgpu_device_handle device_handle; request.alloc_size = 4096; request.phys_alignment = 8192; request.preferred_heap = AMDGPU_GEM_DOMAIN_VRAM; int fd = open("/dev/dri/card0", O_RDWR | O_CLOEXEC); amdgpu_device_initialize(fd, &major_version, &minor_version, &device_handle); for (i = 0; i < MAX_ALLOC; i++) { amdgpu_bo_alloc(device_handle, &request, &vram_handle[i]); } for (i = 0; i < MAX_ALLOC; i++) amdgpu_bo_free(vram_handle[i]); return 0; } v2: Use RB_DECLARE_CALLBACKS_MAX to maintain subtree_max_hole v3: insert_hole_addr() should be static a function fix return value of next_hole_high_addr()/next_hole_low_addr() Reported-by: kbuild test robot v4: Fix commit message. Signed-off-by: Nirmoy Das Reviewed-by: Chris Wilson Acked-by: Christian König --- drivers/gpu/drm/drm_mm.c | 133 +++++++++++++++++++++++++++++++++------ include/drm/drm_mm.h | 1 + 2 files changed, 115 insertions(+), 19 deletions(-) diff --git a/drivers/gpu/drm/drm_mm.c b/drivers/gpu/drm/drm_mm.c index 8981abe8b7c9..f4ca1ff80af9 100644 --- a/drivers/gpu/drm/drm_mm.c +++ b/drivers/gpu/drm/drm_mm.c @@ -212,20 +212,6 @@ static void drm_mm_interval_tree_add_node(struct drm_mm_node *hole_node, &drm_mm_interval_tree_augment); } -#define RB_INSERT(root, member, expr) do { \ - struct rb_node **link = &root.rb_node, *rb = NULL; \ - u64 x = expr(node); \ - while (*link) { \ - rb = *link; \ - if (x < expr(rb_entry(rb, struct drm_mm_node, member))) \ - link = &rb->rb_left; \ - else \ - link = &rb->rb_right; \ - } \ - rb_link_node(&node->member, rb, link); \ - rb_insert_color(&node->member, &root); \ -} while (0) - #define HOLE_SIZE(NODE) ((NODE)->hole_size) #define HOLE_ADDR(NODE) (__drm_mm_hole_node_start(NODE)) @@ -255,16 +241,42 @@ static void insert_hole_size(struct rb_root_cached *root, rb_insert_color_cached(&node->rb_hole_size, root, first); } +RB_DECLARE_CALLBACKS_MAX(static, augment_callbacks, + struct drm_mm_node, rb_hole_addr, + u64, subtree_max_hole, HOLE_SIZE) + +static void insert_hole_addr(struct rb_root *root, struct drm_mm_node *node) +{ + struct rb_node **link = &root->rb_node, *rb_parent = NULL; + u64 start = HOLE_ADDR(node), subtree_max_hole = node->subtree_max_hole; + struct drm_mm_node *parent; + + while (*link) { + rb_parent = *link; + parent = rb_entry(rb_parent, struct drm_mm_node, rb_hole_addr); + if (parent->subtree_max_hole < subtree_max_hole) + parent->subtree_max_hole = subtree_max_hole; + if (start < HOLE_ADDR(parent)) + link = &parent->rb_hole_addr.rb_left; + else + link = &parent->rb_hole_addr.rb_right; + } + + rb_link_node(&node->rb_hole_addr, rb_parent, link); + rb_insert_augmented(&node->rb_hole_addr, root, &augment_callbacks); +} + static void add_hole(struct drm_mm_node *node) { struct drm_mm *mm = node->mm; node->hole_size = __drm_mm_hole_node_end(node) - __drm_mm_hole_node_start(node); + node->subtree_max_hole = node->hole_size; DRM_MM_BUG_ON(!drm_mm_hole_follows(node)); insert_hole_size(&mm->holes_size, node); - RB_INSERT(mm->holes_addr, rb_hole_addr, HOLE_ADDR); + insert_hole_addr(&mm->holes_addr, node); list_add(&node->hole_stack, &mm->hole_stack); } @@ -275,8 +287,10 @@ static void rm_hole(struct drm_mm_node *node) list_del(&node->hole_stack); rb_erase_cached(&node->rb_hole_size, &node->mm->holes_size); - rb_erase(&node->rb_hole_addr, &node->mm->holes_addr); + rb_erase_augmented(&node->rb_hole_addr, &node->mm->holes_addr, + &augment_callbacks); node->hole_size = 0; + node->subtree_max_hole = 0; DRM_MM_BUG_ON(drm_mm_hole_follows(node)); } @@ -361,9 +375,90 @@ first_hole(struct drm_mm *mm, } } +/** + * next_hole_high_addr - returns next hole for a DRM_MM_INSERT_HIGH mode request + * @entry: previously selected drm_mm_node + * @size: size of the a hole needed for the request + * + * This function will verify whether left subtree of @entry has hole big enough + * to fit the requtested size. If so, it will return previous node of @entry or + * else it will return parent node of @entry + * + * It will also skip the complete left subtree if subtree_max_hole of that + * subtree is same as the subtree_max_hole of the @entry. + * + * Returns: + * previous node of @entry if left subtree of @entry can serve the request or + * else return parent of @entry + */ +static struct drm_mm_node * +next_hole_high_addr(struct drm_mm_node *entry, u64 size) +{ + struct rb_node *rb_node, *left_rb_node, *parent_rb_node; + struct drm_mm_node *left_node; + + if (!entry) + return NULL; + + rb_node = &entry->rb_hole_addr; + if (rb_node->rb_left) { + left_rb_node = rb_node->rb_left; + parent_rb_node = rb_parent(rb_node); + left_node = rb_entry(left_rb_node, + struct drm_mm_node, rb_hole_addr); + if ((left_node->subtree_max_hole < size || + entry->size == entry->subtree_max_hole) && + parent_rb_node && parent_rb_node->rb_left != rb_node) + return rb_hole_addr_to_node(parent_rb_node); + } + + return rb_hole_addr_to_node(rb_prev(rb_node)); +} + +/** + * next_hole_low_addr - returns next hole for a DRM_MM_INSERT_LOW mode request + * @entry: previously selected drm_mm_node + * @size: size of the a hole needed for the request + * + * This function will verify whether right subtree of @entry has hole big enough + * to fit the requtested size. If so, it will return next node of @entry or + * else it will return parent node of @entry + * + * It will also skip the complete right subtree if subtree_max_hole of that + * subtree is same as the subtree_max_hole of the @entry. + * + * Returns: + * next node of @entry if right subtree of @entry can serve the request or + * else return parent of @entry + */ +static struct drm_mm_node * +next_hole_low_addr(struct drm_mm_node *entry, u64 size) +{ + struct rb_node *rb_node, *right_rb_node, *parent_rb_node; + struct drm_mm_node *right_node; + + if (!entry) + return NULL; + + rb_node = &entry->rb_hole_addr; + if (rb_node->rb_right) { + right_rb_node = rb_node->rb_right; + parent_rb_node = rb_parent(rb_node); + right_node = rb_entry(right_rb_node, + struct drm_mm_node, rb_hole_addr); + if ((right_node->subtree_max_hole < size || + entry->size == entry->subtree_max_hole) && + parent_rb_node && parent_rb_node->rb_right != rb_node) + return rb_hole_addr_to_node(parent_rb_node); + } + + return rb_hole_addr_to_node(rb_next(rb_node)); +} + static struct drm_mm_node * next_hole(struct drm_mm *mm, struct drm_mm_node *node, + u64 size, enum drm_mm_insert_mode mode) { switch (mode) { @@ -372,10 +467,10 @@ next_hole(struct drm_mm *mm, return rb_hole_size_to_node(rb_prev(&node->rb_hole_size)); case DRM_MM_INSERT_LOW: - return rb_hole_addr_to_node(rb_next(&node->rb_hole_addr)); + return next_hole_low_addr(node, size); case DRM_MM_INSERT_HIGH: - return rb_hole_addr_to_node(rb_prev(&node->rb_hole_addr)); + return next_hole_high_addr(node, size); case DRM_MM_INSERT_EVICT: node = list_next_entry(node, hole_stack); @@ -489,7 +584,7 @@ int drm_mm_insert_node_in_range(struct drm_mm * const mm, remainder_mask = is_power_of_2(alignment) ? alignment - 1 : 0; for (hole = first_hole(mm, range_start, range_end, size, mode); hole; - hole = once ? NULL : next_hole(mm, hole, mode)) { + hole = once ? NULL : next_hole(mm, hole, size, mode)) { u64 hole_start = __drm_mm_hole_node_start(hole); u64 hole_end = hole_start + hole->hole_size; u64 adj_start, adj_end; diff --git a/include/drm/drm_mm.h b/include/drm/drm_mm.h index ee8b0e80ca90..a01bc6fac83c 100644 --- a/include/drm/drm_mm.h +++ b/include/drm/drm_mm.h @@ -168,6 +168,7 @@ struct drm_mm_node { struct rb_node rb_hole_addr; u64 __subtree_last; u64 hole_size; + u64 subtree_max_hole; unsigned long flags; #define DRM_MM_NODE_ALLOCATED_BIT 0 #define DRM_MM_NODE_SCANNED_BIT 1