From patchwork Wed Oct 16 13:46:56 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bas Nieuwenhuizen X-Patchwork-Id: 11193395 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9623E76 for ; Wed, 16 Oct 2019 13:48:34 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 7E99720663 for ; Wed, 16 Oct 2019 13:48:34 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7E99720663 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=basnieuwenhuizen.nl Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=dri-devel-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id EA66A6E981; Wed, 16 Oct 2019 13:48:31 +0000 (UTC) X-Original-To: dri-devel@lists.freedesktop.org Delivered-To: dri-devel@lists.freedesktop.org Received: from mail-wr1-x430.google.com (mail-wr1-x430.google.com [IPv6:2a00:1450:4864:20::430]) by gabe.freedesktop.org (Postfix) with ESMTPS id 2D7546E981 for ; Wed, 16 Oct 2019 13:48:30 +0000 (UTC) Received: by mail-wr1-x430.google.com with SMTP id p4so12282457wrm.8 for ; Wed, 16 Oct 2019 06:48:30 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=cu9hfvksstrhCZGC/lvfNyctt9gwNH9CpvjSu+uHLqs=; b=dk2huC0IXKXjbV03PIUwNtVTuBaXAocEHdSQs9LxZoDg4aEBrtiBc8THP9TCgWFJ6L bF4cy2mHuchT0hlB5WaB9oH8rPlwJf0TGMWDlxSnGSyRxLyiZxigiwEz/aUQdHB4XGkZ 90Sjrh3WHAa3q/HkjxmQKQ511ghz+Zdspo67LJyEa4J+2mWBymB2FxG4xMpZTv1iw91w 1RiKEwwOexW32inLnNzkyzbHQ3hBoJrkEgPeTFI15mC3iNG9PzgvVKKpErumZPtZ4e7V WHbQEQK66ei+2zTF1G7QNetzPTbbsYMj/mTdMDG2jHVmsXjfgaeikj2sm90AttkH1hdR VNJQ== X-Gm-Message-State: APjAAAVHm7G3BZ/582IAbCAmbcnLIavFs6vDKyoEYvxFNt1gZwgL7xnD Y7x7Z0sVUhtjXdtxJcVqcQMrqlzd64sD1wBm X-Google-Smtp-Source: APXvYqxOhAURzUQH2QMg/2oWZs9UrZnLuX/o2JArlNvSM4+d3kZATkdlWy72M/Pr/GpmpWYsN7JOSw== X-Received: by 2002:a5d:4aca:: with SMTP id y10mr2761835wrs.292.1571233708426; Wed, 16 Oct 2019 06:48:28 -0700 (PDT) Received: from localhost.localdomain ([2a02:aa12:a77f:2000:7285:c2ff:fe4e:b21b]) by smtp.gmail.com with ESMTPSA id u68sm3213727wmu.12.2019.10.16.06.48.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 16 Oct 2019 06:48:27 -0700 (PDT) From: Bas Nieuwenhuizen To: dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org Subject: [RFC] drm: Add AMD GFX9+ format modifiers. Date: Wed, 16 Oct 2019 15:46:56 +0200 Message-Id: <20191016134656.3396068-1-bas@basnieuwenhuizen.nl> X-Mailer: git-send-email 2.23.0 MIME-Version: 1.0 X-Mailman-Original-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=basnieuwenhuizen-nl.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=cu9hfvksstrhCZGC/lvfNyctt9gwNH9CpvjSu+uHLqs=; b=MdeJefE/HXMkbVCAO08TVaoIIe/wbBs+pMCLqfU1gLhXl0tA69+fPGKgVxLAC/zSE4 AljMZdSDE1cBITqWvUQGH82usWhzYbceuVwursajfbh53wocGqD+4kHeqNLr46eUQ7kc W4Rl6gcAAWysOiO4EEXNrCMCeryC01vXHynSnJqwdocgPj+xZ9Tk4a+fGny7XR3j4PZf 3pkSsmH0johBDgoJTqKayAoGqWjlkULq+k2hBI/LzG4sB5oesYDuj/iSit8Y98W/hLfv Nn4dhqyW6aZ6uo9PD7sc+w2X13PLpkC1MhG3HYY8juXOQnXjVozu1fARK9yUlT40mWVO 4bJA== X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: ddavenport@chromium.org, christian.koenig@amd.com Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" This adds initial format modifiers for AMD GFX9 and newer GPUs. This is particularly useful to determine if we can use DCC, and whether we need an extra display compatible DCC metadata plane. Design decisions: - Always expose a single plane This way everything works correctly with images with multiple planes. - Do not add an extra memory region in DCC for putting a bit on whether we are in compressed state. A decompress on import is cheap enough if already decompressed, and I do think in most cases we can avoid it in advance during modifier negotiation. The remainder is probably not common enough to worry about. - Explicitly define the sizes as part of the modifier description instead of using whatever the current version of radeonsi does. This way we can avoid dedicated buffers and we can make sure we keep compatibility across mesa versions. I'd like to put some tests on this on ac_surface.c so we can learn early in the process if things need to be changed. Furthermore, the lack of configurable strides on GFX10 means things already go wrong if we do not agree, making a custom stride somewhat less useful. - No usage of BO metadata at all for modifier usecases. To avoid the requirement of dedicated dma bufs per image. For non-modifier based interop we still use the BO metadata, since we need to keep compatibility with old mesa and this is used for depth/msaa/3d/CL etc. API interop. - A single FD for all planes. Easier in Vulkan / bindless and radeonsi is already transitioning. - Make a single modifier for DCN1 It defines things uniquely given bpp, which we can assume, so adding more modifier values do not add clarity. - Not exposing the 4K and 256B tiling modes. These are largely only better for something like a cursor or very long and/or tall images. Are they worth the added complexity to save memory? For context, at 32bpp, tiles are 128x128 pixels. - For multiplane images, every plane uses the same tiling. On GFX9/GFX10 we can, so no need to make it complicated. - We use family_id + external_rev to distinguish between incompatible GPUs. PCI ID is not enough, as RAVEN and RAVEN2 have the same PCI device id, but different tiling. We might be able to find bigger equivalence groups for _X, but especially for DCC I would be uncomfortable making it shared between GPUs. - For DCN1 DCC, radeonsi currently uses another texelbuffer with indices to reorder. This is not shared. Specific to current implementation and does not need to be shared. To pave the way to shader-based solution, lets keep this internal to each driver. This should reduce the modifier churn if any of the driver implementations change. (Especially as you'd want to support the old implementation for a while to stay compatible with old kernels not supporting a new modifier yet). - No support for rotated swizzling. Can be added easily later and nothing in the stack would generate it currently. - Add extra enum values in the definitions. This way we can easily switch on modifier without having to pass around the current GPU everywhere, assuming the modifier has been validated. --- Since my previous attempt for modifiers got bogged down on details for the GFX6-GFX8 modifiers in previous discussions, this only attempts to define modifiers for GFX9+, which is significantly simpler. For a final version I'd like to wait until I have written most of the userspace + kernelspace so we can actually test it. However, I'd appreciate any early feedback people are willing to give. Initial Mesa amd/common support + tests are available at https://gitlab.freedesktop.org/bnieuwenhuizen/mesa/tree/modifiers I tested the HW to actually behave as described in the descriptions on Raven and plan to test on a subset of the others. include/uapi/drm/drm_fourcc.h | 118 ++++++++++++++++++++++++++++++++++ 1 file changed, 118 insertions(+) diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h index 3feeaa3f987a..9bd286ab2bee 100644 --- a/include/uapi/drm/drm_fourcc.h +++ b/include/uapi/drm/drm_fourcc.h @@ -756,6 +756,124 @@ extern "C" { */ #define DRM_FORMAT_MOD_ALLWINNER_TILED fourcc_mod_code(ALLWINNER, 1) +/* + * AMD GFX9+ format modifiers + */ + +/* + * enum-like values for easy switches. + * + * No fixed field-size but implementations are supposed to enforce all-zeros of + * unused bits during validation. + */ +#define DRM_FORMAT_MOD_AMD_GFX9_64K_STANDARD_id 0 +#define DRM_FORMAT_MOD_AMD_GFX9_64K_DISPLAY_id 1 +#define DRM_FORMAT_MOD_AMD_GFX9_64K_X_STANDARD_id 2 +#define DRM_FORMAT_MOD_AMD_GFX9_64K_X_DISPLAY_id 3 +#define DRM_FORMAT_MOD_AMD_GFX10_64K_X_RENDER_id 4 +#define DRM_FORMAT_MOD_AMD_GFX9_64K_X_STANDARD_DCC_id 5 +#define DRM_FORMAT_MOD_AMD_GFX10_64K_X_RENDER_DCC_id 6 +#define DRM_FORMAT_MOD_AMD_GFX9_64K_X_DCN1_DCC_id 7 + +/* + * tiling modes that are compatible between all GPUs that support the tiling + * mode. + * + * STANDARD/DISPLAY/ROTATED + bitdepth determine the indexing within a 256 byte + * micro-block. + * + * The macro-block is 64 KiB and the micro-block in macro-block addressing is + * y0-x0-y1-x1-... up till the dimensions of the macro-block. + * + * The image is then a plain row-major image of macro-blocks. + */ +#define DRM_FORMAT_MOD_AMD_GFX9_64K_STANDARD \ + fourcc_mod_code(AMD, DRM_FORMAT_MOD_AMD_GFX9_64K_STANDARD_id) +#define DRM_FORMAT_MOD_AMD_GFX9_64K_DISPLAY \ + fourcc_mod_code(AMD, DRM_FORMAT_MOD_AMD_GFX9_64K_DISPLAY_id) + +/* + * Same as above, but applies a transformation on the micro-block in macro-block + * indexing that depends on the GPU pipes, shader engines and banks. + * + * RENDER is a new micro-block tiling for GFX10+. + */ +#define DRM_FORMAT_MOD_AMD_GFX9_64K_X_STANDARD(family_id, external_rev) \ + fourcc_mod_code(AMD, DRM_FORMAT_MOD_AMD_GFX9_64K_X_STANDARD_id | \ + ((uint64_t)family_id << 40) | \ + ((uint64_t)external_rev << 48)) +#define DRM_FORMAT_MOD_AMD_GFX9_64K_X_DISPLAY(family_id, external_rev) \ + fourcc_mod_code(AMD, DRM_FORMAT_MOD_AMD_GFX9_64K_X_DISPLAY_id | \ + ((uint64_t)family_id << 40) | \ + ((uint64_t)external_rev << 48)) +#define DRM_FORMAT_MOD_AMD_GFX10_64K_X_RENDER(family_id, external_rev) \ + fourcc_mod_code(AMD, DRM_FORMAT_MOD_AMD_GFX10_64K_X_RENDER_id | \ + ((uint64_t)family_id << 40) | \ + ((uint64_t)external_rev << 48)) + +/* + * Same as above, but with DCC enabled. + * + * We add the PCI ID of the device to make sure the transformation above is + * applied the same way, as well as make sure the implementation of DCC supports + * the same patterns. + * + * The DCC is pipe-aligned (and on GFX9 rb-aligned). + * + * This includes 2 memory regions per plane: + * - main image + * - DCC metadata + * + * These are tightly packed according to platform specific DCC alignment + * requirements. + * + * pipe+rb aligned DCC alignment: + * - GFX9: MAX(65536, + * MIN2(32, pipes * shader_engines) * + * num_backends * interleave_bytes) + * - GFX10 (without rbplus): MAX2(pipes * interleave_bytes, 4096) + * + * aligned DCC size: + * - GFX9: + * tiles of MAX2(256 * num_backends KiB, 1 MiB) of pixel data (prefer + * width if odd log2) at ratio 1/256 + * - GFX10 (without rbplus): + * tiles of 256 * MAX2(pipes * interleave_bytes, 4096) of pixel data + * (prefer width if odd log2) at ratio 1/256 + */ +#define DRM_FORMAT_MOD_AMD_GFX9_64K_X_STANDARD_DCC(family_id, external_rev) \ + fourcc_mod_code(AMD, DRM_FORMAT_MOD_AMD_GFX9_64K_X_STANDARD_DCC_id | \ + ((uint64_t)family_id << 40) | \ + ((uint64_t)external_rev << 48)) +#define DRM_FORMAT_MOD_AMD_GFX10_64K_X_RENDER_DCC(family_id, external_rev) \ + fourcc_mod_code(AMD, DRM_FORMAT_MOD_AMD_GFX10_64K_X_RENDER_DCC_id | \ + ((uint64_t)family_id << 40) | \ + ((uint64_t)external_rev << 48)) + +/* + * DCC that is displayable with DCN1 hardware. + * + * for bpp <= 32 bits, the micro-tiling is STANDARD and for bpp == 64 bits, the + * micro-tiling is DISPLAY. + * + * This includes 3 memory regions per plane: + * - main image + * - DCC (non aligned) + * - DCC (pipe-aligned & rb-aligned) + * + * non-aligned DCC alignment: + * - GFX9: MAX(65536, interleave_bytes) + * - GFX10 (without rbplus): 4096 + * + * non-aligned DCC size: + * - GFX9 & GFX10 (without rbplus): + * tiles for 1 MiB of pixel data (prefer width if odd log2) at ratio 1/256 + */ +#define DRM_FORMAT_MOD_AMD_GFX9_64K_X_DCN1_DCC(family_id, external_rev) \ + fourcc_mod_code(AMD, DRM_FORMAT_MOD_AMD_GFX9_64K_X_DCN1_DCC_id | \ + ((uint64_t)family_id << 40) | \ + ((uint64_t)external_rev << 48)) + #if defined(__cplusplus) } #endif