From patchwork Thu Mar 22 10:23:42 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Thomas Hellstrom X-Patchwork-Id: 10301181 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 48D9660385 for ; Thu, 22 Mar 2018 10:24:40 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3AAF8287DB for ; Thu, 22 Mar 2018 10:24:40 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 2F7C329AA3; Thu, 22 Mar 2018 10:24:40 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.1 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_MED,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 0D9FF287DB for ; Thu, 22 Mar 2018 10:24:39 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 16AB16EBC9; Thu, 22 Mar 2018 10:24:36 +0000 (UTC) X-Original-To: dri-devel@lists.freedesktop.org Delivered-To: dri-devel@lists.freedesktop.org Received: from NAM03-CO1-obe.outbound.protection.outlook.com (mail-co1nam03on0054.outbound.protection.outlook.com [104.47.40.54]) by gabe.freedesktop.org (Postfix) with ESMTPS id 9F1CD6EBBD for ; Thu, 22 Mar 2018 10:24:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=onevmw.onmicrosoft.com; s=selector1-vmware-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=bx5O74UY44LW7Xf5wVCM+jI5VZPtNTAjosueo557oTI=; b=SErmHYBQfrmGQxLNxaiE74yDADdIS5+6w2IIMsFEQycGD5FxC649d97YeoFqrKITmXrURWNHTrpnOs92I01rlldcv0yRveatGvqDXYbwu6zEk2HjimfDspyK2tnwLPd9Z9JYOBy+h58KtTPx+VlnJaxPrp+ew6DIiMAlM+o0YOk= Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=thellstrom@vmware.com; Received: from localhost.localdomain (155.4.205.56) by DM2PR05MB767.namprd05.prod.outlook.com (2a01:111:e400:244d::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.609.6; Thu, 22 Mar 2018 10:24:29 +0000 From: Thomas Hellstrom To: dri-devel@lists.freedesktop.org, linux-graphics-maintainer@vmware.com Subject: [PATCH -next 01/11] drm/vmwgfx: Add a cpu blit utility that can be used for page-backed bos Date: Thu, 22 Mar 2018 11:23:42 +0100 Message-Id: <20180322102352.2881-1-thellstrom@vmware.com> X-Mailer: git-send-email 2.14.3 MIME-Version: 1.0 X-Originating-IP: [155.4.205.56] X-ClientProxiedBy: HE1PR0502CA0021.eurprd05.prod.outlook.com (2603:10a6:3:e3::31) To DM2PR05MB767.namprd05.prod.outlook.com (2a01:111:e400:244d::22) X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: ed27602f-2020-4ec8-535e-08d58fdf1951 X-Microsoft-Antispam: UriScan:; BCL:0; PCL:0; RULEID:(7020095)(4652020)(5600026)(4604075)(4534165)(4627221)(201703031133081)(201702281549075)(2017052603328)(7153060)(7193020); SRVR:DM2PR05MB767; X-Microsoft-Exchange-Diagnostics: 1; DM2PR05MB767; 3:MlhohmMN/Ew3ml8Dwwj1rn9MY/iZ2YP98tclYiz6EIw3KVHYXlrh+DqE4jwd5eHb4xnfsXPOZQqOvoqnQ/EG+gqKpWxjQFns1TR+1nCZebEa6WhIK3KdIf23rkj6zkQqci1yaJHVQMLypOzHbWyYAniC4vsaZMSSwfPkd/acACmsfTGOWGIzsPTODlTuqpVznwqVrdO6rplhX3phFV69Ac3rfJccV96wcGHgXnpB2r/sbhKcZWEYJDqYcQaPNFRc; 25:B3QPfOt2pgAJsKzftBA+zvIbERcUuo+uErzQ1XhIhZ7rrvQdOeGkacfHW8dJ2wI7iiO5Dcb14d/xD9smGZwCapoWTMR+k80hUIyleNQSnXqUkZkmHswdPQf4maCJBYaoJl0uZv8GhQnMJR2ERd32eit+1Zb1HnTpJt3h09sfdwJozDV2TTBhEWErvd9/xBFjiX5AKhXkfJJTd0g6XYWZ3NF36BShfe8XAPrAznDQZtknUa11a4Lak0ozqYZsTvVOSYWBgxn1nBPPsnrYCvqBDLMYjVjgfJS1dbAuTqbkJHOvvO86RmmpxIHOA0vacWwp2SkSRnJMbQ/BNm+1XRGNscovskGLWRy4aBprodPCNX0=; 31:F2fA7t/B1wAZ6Cfi43p/QPM6LImheqtIlbYpg+YXELVS/Jztv8a98UzEYeCKkDx9gFxY7oaCohCUFGeAwT2sQQ6H4GWfVDzUdYNCfAZEqSmCIkRP/GuMbuIGU/xqXBUbR4xJfMBJUYJ1KZ5qFdc5ObDLnmu6IPlwEQ52yXC4IZdSNlaqCBkZXEK8KO1pJ7fvDDTU3atQPf3qXNpshgwKUTl9aIzLr/ZDSIQB8PjAY2g= X-MS-TrafficTypeDiagnostic: DM2PR05MB767: X-LD-Processed: b39138ca-3cee-4b4a-a4d6-cd83d9dd62f0,ExtAddr X-Microsoft-Exchange-Diagnostics: 1; DM2PR05MB767; 20:hQ+KLsEU/kpy7n03K8h6WFNEAtjLnykCh1fHx0z2tKqv15wyJZKv6x1/yPW10PWm04OeTsjSfxz+vwGpdVjGUJsR0mTBa72JJJQe3hF60n7vkzOQR939xiIITGPm5YREtrny6aWm6A1Q62FGPdFJsA6EjGPR9pndBls4ow0BvHgFwbQWUc790YNm/FCKu3uy/X8eOS6wV2VmWv0vyt3fb8QoelvdPesPk1GzVfYny7vroY+YxNCxxvrkIUuhxkaQBgI3E5qgVPlHCQDKsLPyYJCv9Xt7p5RZL/yINY0PA/qPixIXkNZXgmk3H7YOyy/IIJ0QWXVctMzoURuRhCf/hCQvQ7J9mZiQUMKGaI3akt4w6qaRVjBCxq3E93wVO+0yiwEEsYb3j0DnsNnC/JcVTzV4q2z/qICn4JTsjqNXxUyu6w8MQRF23w1hhS4BR5LOT88W3ldP3DYEDFQYw61WMu1dP0urO+xDuxvdOT9YtTnR/G04A5PwloqTQGWba4WC; 4:4hF6P88gR2xGw27r16uO+75yD6tko2bnpkgKNynjfr91Y6ES2YmSqTzKmQ70aRvzT7FYdSOXvaP5cRkPJKltrbdjv91/TmzfhASh9ZBE/Q6RQ32uspKNIeYr55voQV4Fhy3T7lTw3nbeyAMR3d+EU//Yr6cwOS+7aacdjgta1lhS2Hein1O8braaQUo9SVNoXBjemapWHVBQ5BDiGASLrIXvnv0+TnsALT/zC//kI8QTGqTmHHLDXqj7e272UXPkGeJ4xT/Prz7s7Y2x33bRPKO+50nzh+Md8k7WT7oOwmmP6j2MUNHlEcFrQsln5mVnYmKRXXgiEzOJCN9+KYWIOj1LWL1Um+vw1c48oNimGz4= X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(61668805478150)(278428928389397); X-Exchange-Antispam-Report-CFA-Test: BCL:0; PCL:0; RULEID:(8211001083)(6040522)(2401047)(8121501046)(5005006)(3231221)(944501327)(52105095)(10201501046)(3002001)(93006095)(93001095)(6041310)(20161123558120)(20161123562045)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(20161123560045)(20161123564045)(6072148)(201708071742011); SRVR:DM2PR05MB767; BCL:0; PCL:0; RULEID:; SRVR:DM2PR05MB767; X-Forefront-PRVS: 0619D53754 X-Forefront-Antispam-Report: SFV:NSPM; SFS:(10009020)(6069001)(346002)(39380400002)(376002)(396003)(39860400002)(366004)(199004)(189003)(59450400001)(8676002)(81156014)(316002)(68736007)(5660300001)(305945005)(8936002)(50226002)(36756003)(81166006)(86362001)(50466002)(6486002)(186003)(16526019)(6512007)(53936002)(6666003)(6636002)(6506007)(23676004)(386003)(2870700001)(25786009)(6116002)(52116002)(106356001)(105586002)(47776003)(3846002)(26005)(1076002)(2906002)(97736004)(7736002)(66066001)(478600001); DIR:OUT; SFP:1101; SCL:1; SRVR:DM2PR05MB767; H:localhost.localdomain; FPR:; SPF:None; PTR:InfoNoRecords; A:1; MX:1; LANG:en; Received-SPF: None (protection.outlook.com: vmware.com does not designate permitted sender hosts) X-Microsoft-Exchange-Diagnostics: =?utf-8?B?MTtETTJQUjA1TUI3Njc7MjM6WTFoU05SNVptWGV4cVZrbGFCZi8vSmpock5B?= =?utf-8?B?Z3R3dndJVUxhQndIK3pac1NzMmN5dDNFR2MrTXJjVFZPSXVjcDFuSG0rSDZp?= =?utf-8?B?QlJnell2blNmQUtRdlJFVXdrWDVIUEo1ZFpZK3NYeHUyU2w1aWhpWW1NZUpa?= =?utf-8?B?ZmNFWFNkdnNSZXBKVnpHNG5LdmpuYlBPVkduajdsQnZMbVZrZVlnc3FCWlk3?= =?utf-8?B?V0hOMWVMTER5Unk0WmtrS3ljZHRrenk0VUM0ekVjdW9QS0lYM0RPVVNIL0tL?= =?utf-8?B?U2Nad2pkMlYwT3p4TzkvWkRaaUJwMDJRczV4Y2szNEg0b0ZSTXY3YlZJYmg4?= =?utf-8?B?M1ZxSGxQYWI5a1NZbkszdDJsSU04TDhLUVV2Zlg1ZWhrVGFYSXo3MlBwVFB0?= =?utf-8?B?VldkdnJTNDY4ckk2MERCeXF3ZUExTERZcU9ROU1ZWEF3UjMxK2J5VUhaT2pn?= =?utf-8?B?aEhncDFmN05iQ3ExZzhFVnJJZ3FEaU93V2g5ZGJGemhwMmFjVFZEOUFvaWxK?= =?utf-8?B?dVhGbmt1Y0wwSnRPemZ0WFdqT09wTlhJTGhGVWtRQ3JIdUlNWlhiZ2RiSEls?= =?utf-8?B?Nk9NSG4zRCsyVmRpU0xhQVlhM1dBRmlYdXBkR2Q4am96OGgzU0RmaE0yb0Jq?= =?utf-8?B?ZXF0SkU5dDZMU2Z3SVpNb0tUMHNuek1jQmJORVdrUko1YTFDV0JQdUJ2bkhs?= =?utf-8?B?RnpwTUlWT0lDNHM1dXJIa3RRdm9uUlRTNElzZjZocjNGRysxRFQ5OVRQamJZ?= =?utf-8?B?bVJ4LzRCTkxwV1BqZlB3LzRNN09VUWV5V1kwOTkzMTVmY2FNUGpibVU2bmU1?= =?utf-8?B?V1RNRERVNllzZWtTaTM2bGU0cE5KOE50WmpTZjlBaG96bDdCNmlieHBVTXFR?= =?utf-8?B?WW94YXE0aEQxWUNCZXBCcXlMUlVJMXdGd1E4WXNPRmkyR0ZhNWQwVlJZeFZS?= =?utf-8?B?OXM5RElDRlZ5SFBuZE1wQ0dpaGo4YS9ZK1FESlJGY3EvMDFUY0dOQkE3MkpR?= =?utf-8?B?WWtsOGpkWnZFQ1JyRWp3K0FkTkZicFBCdUpSYzdWeGdzZmVSMnRBRVlYaXY2?= =?utf-8?B?bC9iTzBkbFEzQjZrME9STWRxeUpVRUxiaG5sT20vVldnZU1na0VqYnBEdEg3?= =?utf-8?B?VWFpU2xpLzA3dWRSdkZJakJSSUlpWU45R3orZmpsdGUvamtXZHRBWTVpRTRz?= =?utf-8?B?d3dtaVVHd2lMRW9JNmcxbnMxMkxacjY5TFBBamlWelVKZU1ZZi9xM0VGWHJU?= =?utf-8?B?SURIM28zdzZjRmFuUGRmYjkxNk9EQ2pIbG5hajF0a2daMEYxNm05TGxDTnpP?= =?utf-8?B?M01Kb1dRb1VqRUk4S0JMcVVRZ1NQS2plTDRHZXNRTnlVVllqSEdYeTYrQVZl?= =?utf-8?B?TlZ4TUVGZjRjbVZsQmhXTzUzUzA1L01ZY1ZhNitRVlI1VnJzNW56bjYrOHNF?= =?utf-8?Q?9V+pov5GM9lx5/H1FtnEV0vtLt?= X-Microsoft-Antispam-Message-Info: PLt/Mx3FK2rGS7Dc4yHHcKchIyX/b3cToIse8qyNu36RnQ7/H8RASXSCQpal/Do/D+gynXb8Z3LOvvEc7Alq8HTpMCB+KJNqUQTKm1E8oN4T5a+cEwkQVBw5oC9ay1Y92Ltrnp34/qGbtbIHsDLsF4udeIieBlcbWjQ4mp218h8Q0njZjE2+5+YCNYHX11m6 X-Microsoft-Exchange-Diagnostics: 1; DM2PR05MB767; 6:PJwdZYpCid2BjyGzes/vCAFhpc8E/eO7VgBjM54zooBoQn6s2gOduyfjzhzac7qY4srB48yWMj6X0+dG5g26qNdaiDR7+RX8cJ9zxdnMC2XWhId635dtFbMe9XjETQ9ookKtNsnx5S9URoAGKnQkp0SYzTYiU6SyrYwl7rOUJXEl28WaZWgDds7xbo0VkXs6rMAT4lJnLgx+U/po7ERWpbfJIJf1EilisL+8QfPxlUqPYQvGvSjWofPdEquUTNk5ViQ3GbCaDkM2AK1iZ2qRMdNwNDSaVnvXZFNny1wkhvMszrKxkzDbQsX8yCE1gnVeBi0k8AFXxaqtUC06hMeZTD2UycM2MSKWbywGY8oR5z7nIOVmHLTJDXC4XO2sKvKcyVt2qCF+VSJHL+kFtLK4Xa4Z4+uyanE/gXs9xDP1rLZ7YO5Xw44awDat2i0L9ukDAlbPJTcloDiGc0XdoAfXwA==; 5:lC0m2CCci72EipmgIN8vpRzugY/h2PdVbnm5hj27Wc7MkPXJWScw21BQtC4R0XfSXf2yATTpKhSpYZ530qfArPe+VRq5lOP13yY/qCeMokrGOfTT9sFyrR0/oOm3uMX6KuI/Y2nbpdUcLw2KdDPXg1eEhJgCbepgc8ubMG20pEo=; 24:DaNH06pDbzz5yiPvzx+NdG+LZKGAmTSEacJdBCXQZwXl5EUtPC7iC02j4+w0XBBQeERdytBJ4UBYG72UhI5FnLO8J4ROZwUC5pkHfn0BzuM= SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1; DM2PR05MB767; 7:G420/DTun7cY6wqT3ckjED7+VihaP+ovAuTdPl5V15AdAjMFHaV53QPGwr2YmeQ96EflpiDYIwI0naCWqMGUNUyQpcCHEaSoF0qOjJPrlYT7V54D5yp5nQTjOvUn+PRT1k7tHYxFUWtIy+dPXphKEQk4d9IScVo4Xpk9lvhwnQFXzv3QHeJWK6B7gQVU4iCI5tGzHuBv+DBVlzg00PSRpw/oMfcshgnbyFEhV98DiMnVujdyGltDR6kKaOfXBv4s; 20:ckA+RqIa/CI4h1SKnb4QvIBPc5D8LgW9d9axhSKYpvKziJtT7KuCPvqkUgipT2qi9bAzSoww1kXSDLhtlCCYzBEQ9mF8BsvQx7Mo6ChQ8m+U4mf1I+demHy1DSYPbNEEs831NQMi6yBJrTGDIJIhaiHp/FpM8/3oMtVZawkIEFo= X-OriginatorOrg: vmware.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 22 Mar 2018 10:24:29.5782 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: ed27602f-2020-4ec8-535e-08d58fdf1951 X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: b39138ca-3cee-4b4a-a4d6-cd83d9dd62f0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM2PR05MB767 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" X-Virus-Scanned: ClamAV using ClamSMTP The utility uses kmap_atomic() instead of vmapping the whole buffer object. As a result there will be more book-keeping but on some architectures this will help avoid exhausting vmalloc space and also avoid expensive TLB flushes. The blit utility also adds a provision to compute a bounding box of changed content, which is very useful to optimize presentation speed of ill-behaved applications that don't supply proper damage regions, and for page-flips. The cost of computing the bounding box is not that expensive when done in a cpu-blit utility like this. Signed-off-by: Thomas Hellstrom Reviewed-by: Brian Paul --- drivers/gpu/drm/vmwgfx/Makefile | 2 +- drivers/gpu/drm/vmwgfx/vmwgfx_blit.c | 506 +++++++++++++++++++++++++++++++++++ drivers/gpu/drm/vmwgfx/vmwgfx_drv.h | 48 ++++ 3 files changed, 555 insertions(+), 1 deletion(-) create mode 100644 drivers/gpu/drm/vmwgfx/vmwgfx_blit.c diff --git a/drivers/gpu/drm/vmwgfx/Makefile b/drivers/gpu/drm/vmwgfx/Makefile index ad80211e1098..794cc9d5c9b0 100644 --- a/drivers/gpu/drm/vmwgfx/Makefile +++ b/drivers/gpu/drm/vmwgfx/Makefile @@ -7,6 +7,6 @@ vmwgfx-y := vmwgfx_execbuf.o vmwgfx_gmr.o vmwgfx_kms.o vmwgfx_drv.o \ vmwgfx_surface.o vmwgfx_prime.o vmwgfx_mob.o vmwgfx_shader.o \ vmwgfx_cmdbuf_res.o vmwgfx_cmdbuf.o vmwgfx_stdu.o \ vmwgfx_cotable.o vmwgfx_so.o vmwgfx_binding.o vmwgfx_msg.o \ - vmwgfx_simple_resource.o vmwgfx_va.o + vmwgfx_simple_resource.o vmwgfx_va.o vmwgfx_blit.o obj-$(CONFIG_DRM_VMWGFX) := vmwgfx.o diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_blit.c b/drivers/gpu/drm/vmwgfx/vmwgfx_blit.c new file mode 100644 index 000000000000..2730403e8df9 --- /dev/null +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_blit.c @@ -0,0 +1,506 @@ +/************************************************************************** + * + * Copyright © 2017 VMware, Inc., Palo Alto, CA., USA + * All Rights Reserved. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the + * "Software"), to deal in the Software without restriction, including + * without limitation the rights to use, copy, modify, merge, publish, + * distribute, sub license, and/or sell copies of the Software, and to + * permit persons to whom the Software is furnished to do so, subject to + * the following conditions: + * + * The above copyright notice and this permission notice (including the + * next paragraph) shall be included in all copies or substantial portions + * of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL + * THE COPYRIGHT HOLDERS, AUTHORS AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM, + * DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR + * OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE + * USE OR OTHER DEALINGS IN THE SOFTWARE. + * + **************************************************************************/ + +#include "vmwgfx_drv.h" + +/* + * Template that implements find_first_diff() for a generic + * unsigned integer type. @size and return value are in bytes. + */ +#define VMW_FIND_FIRST_DIFF(_type) \ +static size_t vmw_find_first_diff_ ## _type \ + (const _type * dst, const _type * src, size_t size)\ +{ \ + size_t i; \ + \ + for (i = 0; i < size; i += sizeof(_type)) { \ + if (*dst++ != *src++) \ + break; \ + } \ + \ + return i; \ +} + + +/* + * Template that implements find_last_diff() for a generic + * unsigned integer type. Pointers point to the item following the + * *end* of the area to be examined. @size and return value are in + * bytes. + */ +#define VMW_FIND_LAST_DIFF(_type) \ +static ssize_t vmw_find_last_diff_ ## _type( \ + const _type * dst, const _type * src, size_t size) \ +{ \ + while (size) { \ + if (*--dst != *--src) \ + break; \ + \ + size -= sizeof(_type); \ + } \ + return size; \ +} + + +/* + * Instantiate find diff functions for relevant unsigned integer sizes, + * assuming that wider integers are faster (including aligning) up to the + * architecture native width, which is assumed to be 32 bit unless + * CONFIG_64BIT is defined. + */ +VMW_FIND_FIRST_DIFF(u8); +VMW_FIND_LAST_DIFF(u8); + +VMW_FIND_FIRST_DIFF(u16); +VMW_FIND_LAST_DIFF(u16); + +VMW_FIND_FIRST_DIFF(u32); +VMW_FIND_LAST_DIFF(u32); + +#ifdef CONFIG_64BIT +VMW_FIND_FIRST_DIFF(u64); +VMW_FIND_LAST_DIFF(u64); +#endif + + +/* We use size aligned copies. This computes (addr - align(addr)) */ +#define SPILL(_var, _type) ((unsigned long) _var & (sizeof(_type) - 1)) + + +/* + * Template to compute find_first_diff() for a certain integer type + * including a head copy for alignment, and adjustment of parameters + * for tail find or increased resolution find using an unsigned integer find + * of smaller width. If finding is complete, and resolution is sufficient, + * the macro executes a return statement. Otherwise it falls through. + */ +#define VMW_TRY_FIND_FIRST_DIFF(_type) \ +do { \ + unsigned int spill = SPILL(dst, _type); \ + size_t diff_offs; \ + \ + if (spill && spill == SPILL(src, _type) && \ + sizeof(_type) - spill <= size) { \ + spill = sizeof(_type) - spill; \ + diff_offs = vmw_find_first_diff_u8(dst, src, spill); \ + if (diff_offs < spill) \ + return round_down(offset + diff_offs, granularity); \ + \ + dst += spill; \ + src += spill; \ + size -= spill; \ + offset += spill; \ + spill = 0; \ + } \ + if (!spill && !SPILL(src, _type)) { \ + size_t to_copy = size & ~(sizeof(_type) - 1); \ + \ + diff_offs = vmw_find_first_diff_ ## _type \ + ((_type *) dst, (_type *) src, to_copy); \ + if (diff_offs >= size || granularity == sizeof(_type)) \ + return (offset + diff_offs); \ + \ + dst += diff_offs; \ + src += diff_offs; \ + size -= diff_offs; \ + offset += diff_offs; \ + } \ +} while (0) \ + + +/** + * vmw_find_first_diff - find the first difference between dst and src + * + * @dst: The destination address + * @src: The source address + * @size: Number of bytes to compare + * @granularity: The granularity needed for the return value in bytes. + * return: The offset from find start where the first difference was + * encountered in bytes. If no difference was found, the function returns + * a value >= @size. + */ +static size_t vmw_find_first_diff(const u8 *dst, const u8 *src, size_t size, + size_t granularity) +{ + size_t offset = 0; + + /* + * Try finding with large integers if alignment allows, or we can + * fix it. Fall through if we need better resolution or alignment + * was bad. + */ +#ifdef CONFIG_64BIT + VMW_TRY_FIND_FIRST_DIFF(u64); +#endif + VMW_TRY_FIND_FIRST_DIFF(u32); + VMW_TRY_FIND_FIRST_DIFF(u16); + + return round_down(offset + vmw_find_first_diff_u8(dst, src, size), + granularity); +} + + +/* + * Template to compute find_last_diff() for a certain integer type + * including a tail copy for alignment, and adjustment of parameters + * for head find or increased resolution find using an unsigned integer find + * of smaller width. If finding is complete, and resolution is sufficient, + * the macro executes a return statement. Otherwise it falls through. + */ +#define VMW_TRY_FIND_LAST_DIFF(_type) \ +do { \ + unsigned int spill = SPILL(dst, _type); \ + ssize_t location; \ + ssize_t diff_offs; \ + \ + if (spill && spill <= size && spill == SPILL(src, _type)) { \ + diff_offs = vmw_find_last_diff_u8(dst, src, spill); \ + if (diff_offs) { \ + location = size - spill + diff_offs - 1; \ + return round_down(location, granularity); \ + } \ + \ + dst -= spill; \ + src -= spill; \ + size -= spill; \ + spill = 0; \ + } \ + if (!spill && !SPILL(src, _type)) { \ + size_t to_copy = round_down(size, sizeof(_type)); \ + \ + diff_offs = vmw_find_last_diff_ ## _type \ + ((_type *) dst, (_type *) src, to_copy); \ + location = size - to_copy + diff_offs - sizeof(_type); \ + if (location < 0 || granularity == sizeof(_type)) \ + return location; \ + \ + dst -= to_copy - diff_offs; \ + src -= to_copy - diff_offs; \ + size -= to_copy - diff_offs; \ + } \ +} while (0) + + +/** + * vmw_find_last_diff - find the last difference between dst and src + * + * @dst: The destination address + * @src: The source address + * @size: Number of bytes to compare + * @granularity: The granularity needed for the return value in bytes. + * return: The offset from find start where the last difference was + * encountered in bytes, or a negative value if no difference was found. + */ +static ssize_t vmw_find_last_diff(const u8 *dst, const u8 *src, size_t size, + size_t granularity) +{ + dst += size; + src += size; + +#ifdef CONFIG_64BIT + VMW_TRY_FIND_LAST_DIFF(u64); +#endif + VMW_TRY_FIND_LAST_DIFF(u32); + VMW_TRY_FIND_LAST_DIFF(u16); + + return round_down(vmw_find_last_diff_u8(dst, src, size) - 1, + granularity); +} + + +/** + * vmw_memcpy - A wrapper around kernel memcpy with allowing to plug it into a + * struct vmw_diff_cpy. + * + * @diff: The struct vmw_diff_cpy closure argument (unused). + * @dest: The copy destination. + * @src: The copy source. + * @n: Number of bytes to copy. + */ +void vmw_memcpy(struct vmw_diff_cpy *diff, u8 *dest, const u8 *src, size_t n) +{ + memcpy(dest, src, n); +} + + +/** + * vmw_adjust_rect - Adjust rectangle coordinates for newly found difference + * + * @diff: The struct vmw_diff_cpy used to track the modified bounding box. + * @diff_offs: The offset from @diff->line_offset where the difference was + * found. + */ +static void vmw_adjust_rect(struct vmw_diff_cpy *diff, size_t diff_offs) +{ + size_t offs = (diff_offs + diff->line_offset) / diff->cpp; + struct drm_rect *rect = &diff->rect; + + rect->x1 = min_t(int, rect->x1, offs); + rect->x2 = max_t(int, rect->x2, offs + 1); + rect->y1 = min_t(int, rect->y1, diff->line); + rect->y2 = max_t(int, rect->y2, diff->line + 1); +} + +/** + * vmw_diff_memcpy - memcpy that creates a bounding box of modified content. + * + * @diff: The struct vmw_diff_cpy used to track the modified bounding box. + * @dest: The copy destination. + * @src: The copy source. + * @n: Number of bytes to copy. + * + * In order to correctly track the modified content, the field @diff->line must + * be pre-loaded with the current line number, the field @diff->line_offset must + * be pre-loaded with the line offset in bytes where the copy starts, and + * finally the field @diff->cpp need to be preloaded with the number of bytes + * per unit in the horizontal direction of the area we're examining. + * Typically bytes per pixel. + * This is needed to know the needed granularity of the difference computing + * operations. A higher cpp generally leads to faster execution at the cost of + * bounding box width precision. + */ +void vmw_diff_memcpy(struct vmw_diff_cpy *diff, u8 *dest, const u8 *src, + size_t n) +{ + ssize_t csize, byte_len; + + if (WARN_ON_ONCE(round_down(n, diff->cpp) != n)) + return; + + /* TODO: Possibly use a single vmw_find_first_diff per line? */ + csize = vmw_find_first_diff(dest, src, n, diff->cpp); + if (csize < n) { + vmw_adjust_rect(diff, csize); + byte_len = diff->cpp; + + /* + * Starting from where first difference was found, find + * location of last difference, and then copy. + */ + diff->line_offset += csize; + dest += csize; + src += csize; + n -= csize; + csize = vmw_find_last_diff(dest, src, n, diff->cpp); + if (csize >= 0) { + byte_len += csize; + vmw_adjust_rect(diff, csize); + } + memcpy(dest, src, byte_len); + } + diff->line_offset += n; +} + +/** + * struct vmw_bo_blit_line_data - Convenience argument to vmw_bo_cpu_blit_line + * + * @mapped_dst: Already mapped destination page index in @dst_pages. + * @dst_addr: Kernel virtual address of mapped destination page. + * @dst_pages: Array of destination bo pages. + * @dst_num_pages: Number of destination bo pages. + * @dst_prot: Destination bo page protection. + * @mapped_src: Already mapped source page index in @dst_pages. + * @src_addr: Kernel virtual address of mapped source page. + * @src_pages: Array of source bo pages. + * @src_num_pages: Number of source bo pages. + * @src_prot: Source bo page protection. + * @diff: Struct vmw_diff_cpy, in the end forwarded to the memcpy routine. + */ +struct vmw_bo_blit_line_data { + u32 mapped_dst; + u8 *dst_addr; + struct page **dst_pages; + u32 dst_num_pages; + pgprot_t dst_prot; + u32 mapped_src; + u8 *src_addr; + struct page **src_pages; + u32 src_num_pages; + pgprot_t src_prot; + struct vmw_diff_cpy *diff; +}; + +/** + * vmw_bo_cpu_blit_line - Blit part of a line from one bo to another. + * + * @d: Blit data as described above. + * @dst_offset: Destination copy start offset from start of bo. + * @src_offset: Source copy start offset from start of bo. + * @bytes_to_copy: Number of bytes to copy in this line. + */ +static int vmw_bo_cpu_blit_line(struct vmw_bo_blit_line_data *d, + u32 dst_offset, + u32 src_offset, + u32 bytes_to_copy) +{ + struct vmw_diff_cpy *diff = d->diff; + + while (bytes_to_copy) { + u32 copy_size = bytes_to_copy; + u32 dst_page = dst_offset >> PAGE_SHIFT; + u32 src_page = src_offset >> PAGE_SHIFT; + u32 dst_page_offset = dst_offset & ~PAGE_MASK; + u32 src_page_offset = src_offset & ~PAGE_MASK; + bool unmap_dst = d->dst_addr && dst_page != d->mapped_dst; + bool unmap_src = d->src_addr && (src_page != d->mapped_src || + unmap_dst); + + copy_size = min_t(u32, copy_size, PAGE_SIZE - dst_page_offset); + copy_size = min_t(u32, copy_size, PAGE_SIZE - src_page_offset); + + if (unmap_src) { + ttm_kunmap_atomic_prot(d->src_addr, d->src_prot); + d->src_addr = NULL; + } + + if (unmap_dst) { + ttm_kunmap_atomic_prot(d->dst_addr, d->dst_prot); + d->dst_addr = NULL; + } + + if (!d->dst_addr) { + if (WARN_ON_ONCE(dst_page >= d->dst_num_pages)) + return -EINVAL; + + d->dst_addr = + ttm_kmap_atomic_prot(d->dst_pages[dst_page], + d->dst_prot); + if (!d->dst_addr) + return -ENOMEM; + + d->mapped_dst = dst_page; + } + + if (!d->src_addr) { + if (WARN_ON_ONCE(src_page >= d->src_num_pages)) + return -EINVAL; + + d->src_addr = + ttm_kmap_atomic_prot(d->src_pages[src_page], + d->src_prot); + if (!d->src_addr) + return -ENOMEM; + + d->mapped_src = src_page; + } + diff->memcpy(diff, d->dst_addr + dst_page_offset, + d->src_addr + src_page_offset, copy_size); + + bytes_to_copy -= copy_size; + dst_offset += copy_size; + src_offset += copy_size; + } + + return 0; +} + +/** + * ttm_bo_cpu_blit - in-kernel cpu blit. + * + * @dst: Destination buffer object. + * @dst_offset: Destination offset of blit start in bytes. + * @dst_stride: Destination stride in bytes. + * @src: Source buffer object. + * @src_offset: Source offset of blit start in bytes. + * @src_stride: Source stride in bytes. + * @w: Width of blit. + * @h: Height of blit. + * return: Zero on success. Negative error value on failure. Will print out + * kernel warnings on caller bugs. + * + * Performs a CPU blit from one buffer object to another avoiding a full + * bo vmap which may exhaust- or fragment vmalloc space. + * On supported architectures (x86), we're using kmap_atomic which avoids + * cross-processor TLB- and cache flushes and may, on non-HIGHMEM systems + * reference already set-up mappings. + * + * Neither of the buffer objects may be placed in PCI memory + * (Fixed memory in TTM terminology) when using this function. + */ +int vmw_bo_cpu_blit(struct ttm_buffer_object *dst, + u32 dst_offset, u32 dst_stride, + struct ttm_buffer_object *src, + u32 src_offset, u32 src_stride, + u32 w, u32 h, + struct vmw_diff_cpy *diff) +{ + struct ttm_operation_ctx ctx = { + .interruptible = false, + .no_wait_gpu = false + }; + u32 j, initial_line = dst_offset / dst_stride; + struct vmw_bo_blit_line_data d; + int ret = 0; + + /* Buffer objects need to be either pinned or reserved: */ + if (!(dst->mem.placement & TTM_PL_FLAG_NO_EVICT)) + lockdep_assert_held(&dst->resv->lock.base); + if (!(src->mem.placement & TTM_PL_FLAG_NO_EVICT)) + lockdep_assert_held(&src->resv->lock.base); + + if (dst->ttm->state == tt_unpopulated) { + ret = dst->ttm->bdev->driver->ttm_tt_populate(dst->ttm, &ctx); + if (ret) + return ret; + } + + if (src->ttm->state == tt_unpopulated) { + ret = src->ttm->bdev->driver->ttm_tt_populate(src->ttm, &ctx); + if (ret) + return ret; + } + + d.mapped_dst = 0; + d.mapped_src = 0; + d.dst_addr = NULL; + d.src_addr = NULL; + d.dst_pages = dst->ttm->pages; + d.src_pages = src->ttm->pages; + d.dst_num_pages = dst->num_pages; + d.src_num_pages = src->num_pages; + d.dst_prot = ttm_io_prot(dst->mem.placement, PAGE_KERNEL); + d.src_prot = ttm_io_prot(src->mem.placement, PAGE_KERNEL); + d.diff = diff; + + for (j = 0; j < h; ++j) { + diff->line = j + initial_line; + diff->line_offset = dst_offset % dst_stride; + ret = vmw_bo_cpu_blit_line(&d, dst_offset, src_offset, w); + if (ret) + goto out; + + dst_offset += dst_stride; + src_offset += src_stride; + } +out: + if (d.src_addr) + ttm_kunmap_atomic_prot(d.src_addr, d.src_prot); + if (d.dst_addr) + ttm_kunmap_atomic_prot(d.dst_addr, d.dst_prot); + + return ret; +} diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h index d08753e8fd94..15b22e56b8d1 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h @@ -678,6 +678,7 @@ extern void vmw_fence_single_bo(struct ttm_buffer_object *bo, struct vmw_fence_obj *fence); extern void vmw_resource_evict_all(struct vmw_private *dev_priv); + /** * DMA buffer helper routines - vmwgfx_dmabuf.c */ @@ -1165,6 +1166,53 @@ extern int vmw_cmdbuf_cur_flush(struct vmw_cmdbuf_man *man, bool interruptible); extern void vmw_cmdbuf_irqthread(struct vmw_cmdbuf_man *man); +/* CPU blit utilities - vmwgfx_blit.c */ + +/** + * struct vmw_diff_cpy - CPU blit information structure + * + * @rect: The output bounding box rectangle. + * @line: The current line of the blit. + * @line_offset: Offset of the current line segment. + * @cpp: Bytes per pixel (granularity information). + * @memcpy: Which memcpy function to use. + */ +struct vmw_diff_cpy { + struct drm_rect rect; + size_t line; + size_t line_offset; + int cpp; + void (*memcpy)(struct vmw_diff_cpy *diff, u8 *dest, const u8 *src, + size_t n); +}; + +#define VMW_CPU_BLIT_INITIALIZER { \ + .memcpy = vmw_memcpy, \ +} + +#define VMW_CPU_BLIT_DIFF_INITIALIZER(_cpp) { \ + .line = 0, \ + .line_offset = 0, \ + .rect = { .x1 = INT_MAX/2, \ + .y1 = INT_MAX/2, \ + .x2 = INT_MIN/2, \ + .y2 = INT_MIN/2 \ + }, \ + .cpp = _cpp, \ + .memcpy = vmw_diff_memcpy, \ +} + +void vmw_diff_memcpy(struct vmw_diff_cpy *diff, u8 *dest, const u8 *src, + size_t n); + +void vmw_memcpy(struct vmw_diff_cpy *diff, u8 *dest, const u8 *src, size_t n); + +int vmw_bo_cpu_blit(struct ttm_buffer_object *dst, + u32 dst_offset, u32 dst_stride, + struct ttm_buffer_object *src, + u32 src_offset, u32 src_stride, + u32 w, u32 h, + struct vmw_diff_cpy *diff); /** * Inline helper functions