Message ID | 20250225-apple-twiddled-modifiers-v2-1-cf69729e87f6@rosenzweig.io (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v2] drm: add modifiers for Apple GPU layouts | expand |
Hi, On Tue, 25 Feb 2025 at 21:35, Alyssa Rosenzweig <alyssa@rosenzweig.io> wrote: > These layouts are notably not used for interchange across hardware > blocks (e.g. with the display controller). There are other layouts for > that but we don't support them either in userspace or kernelspace yet > (even downstream), so we don't add modifiers here. Yeah, when those have users with good definitions matching these, we can add them here, even if they are downstream. Anything the GPU would share out of context, or the codec, would be good for this. > +/* > + * Apple GPU-tiled layout. > + * > + * GPU-tiled images are divided into tiles. Tiles are always 16KiB, with > + * dimensions depending on the base-format. Within a tile, pixels are fully > + * interleaved (Morton order). Tiles themselves are raster-order. > + * > + * Images must be 16-byte aligned. > + * > + * For more information see > + * https://docs.mesa3d.org/drivers/asahi.html#image-layouts This writeup is really nice. I would prefer it was inlined here though (similar to AFBC), with Mesa then referring to the kernel, but tbf Vivante does have a similar external reference. The one thing it's missing is an explicit description of how stride is computed and used. I can see the 'obvious' way to do it for this and compression, but it would be good to make it explicit, given some misadventures in the past. CCS is probably the gold standard to refer to here. I'd be very happy to merge it with that. Cheers, Daniel
> > These layouts are notably not used for interchange across hardware > > blocks (e.g. with the display controller). There are other layouts for > > that but we don't support them either in userspace or kernelspace yet > > (even downstream), so we don't add modifiers here. > > Yeah, when those have users with good definitions matching these, we > can add them here, even if they are downstream. Anything the GPU would > share out of context, or the codec, would be good for this. Sure. The mentioned "other layouts" are for scanout (GPU->display), and apparently the GPU can render but not sample them... You can imagine about how well that would go without a vendor extension + compositor patches...... (Currently we use linear framebuffers for scanout and avoid that rat's nest.) > > > +/* > > + * Apple GPU-tiled layout. > > + * > > + * GPU-tiled images are divided into tiles. Tiles are always 16KiB, with > > + * dimensions depending on the base-format. Within a tile, pixels are fully > > + * interleaved (Morton order). Tiles themselves are raster-order. > > + * > > + * Images must be 16-byte aligned. > > + * > > + * For more information see > > + * https://docs.mesa3d.org/drivers/asahi.html#image-layouts > > This writeup is really nice. I would prefer it was inlined here though > (similar to AFBC), with Mesa then referring to the kernel, but tbf > Vivante does have a similar external reference. Thanks :-) I wasn't sure which way would be better. Most of the complexity in that writeup relates to mipmapping and arrayed images, which I don't think WSI hits...? Also, the Mesa docs are easier for me to update, support tables and LaTeX, have other bits of hw writeups on the same page, etc... so they feel like a better home for the unabridged version. And since Vivante did this, I figured it was ok. If people feel strongly I can of course inline it, it's just not clear to me that dumping all that info into the header here is actually desired. (And there's even more info Marge queued ... https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33743/diffs?commit_id=5ddb57ceb46d42096a34cbef1df6b4109ac2b511 ) > The one thing it's missing is an explicit description of how stride is > computed and used. I can see the 'obvious' way to do it for this and > compression, but it would be good to make it explicit, given some > misadventures in the past. CCS is probably the gold standard to refer > to here. Ah, right -- thanks for raising that! I'll add this for v3. Indeed, I picked the "obvious" way, given said misadventures with AFBC ;) This is the relevant Mesa code: /* * For WSI purposes, we need to associate a stride with all layouts. * In the hardware, only strided linear images have an associated * stride, there is no natural stride associated with twiddled * images. However, various clients assert that the stride is valid * for the image if it were linear (even if it is in fact not * linear). In those cases, by convention we use the minimum valid * such stride. */ static inline uint32_t ail_get_wsi_stride_B(const struct ail_layout *layout, unsigned level) { assert(level == 0 && "Mipmaps cannot be shared as WSI"); if (layout->tiling == AIL_TILING_LINEAR) return ail_get_linear_stride_B(layout, level); else return util_format_get_stride(layout->format, layout->width_px); } I can either copy that comment here, or to make that logic more explicit: Producers must set the stride to the image width in pixels times the bytes per pixel. This is a software convention, the hardware does not use this stride. Thanks, Alyssa
Hey, On Tue, 25 Feb 2025 at 22:18, Alyssa Rosenzweig <alyssa@rosenzweig.io> wrote: > > > These layouts are notably not used for interchange across hardware > > > blocks (e.g. with the display controller). There are other layouts for > > > that but we don't support them either in userspace or kernelspace yet > > > (even downstream), so we don't add modifiers here. > > > > Yeah, when those have users with good definitions matching these, we > > can add them here, even if they are downstream. Anything the GPU would > > share out of context, or the codec, would be good for this. > > Sure. The mentioned "other layouts" are for scanout (GPU->display), and > apparently the GPU can render but not sample them... You can imagine > about how well that would go without a vendor extension + compositor > patches...... > > (Currently we use linear framebuffers for scanout and avoid that rat's > nest.) Heh, impressive. Are those the twiddled-but-not-tiled ones? > > > +/* > > > + * Apple GPU-tiled layout. > > > + * > > > + * GPU-tiled images are divided into tiles. Tiles are always 16KiB, with > > > + * dimensions depending on the base-format. Within a tile, pixels are fully > > > + * interleaved (Morton order). Tiles themselves are raster-order. > > > + * > > > + * Images must be 16-byte aligned. > > > + * > > > + * For more information see > > > + * https://docs.mesa3d.org/drivers/asahi.html#image-layouts > > > > This writeup is really nice. I would prefer it was inlined here though > > (similar to AFBC), with Mesa then referring to the kernel, but tbf > > Vivante does have a similar external reference. > > Thanks :-) I wasn't sure which way would be better. Most of the > complexity in that writeup relates to mipmapping and arrayed images, > which I don't think WSI hits...? Yeah, anything that wouldn't be exported out of a GPU API context doesn't need to be in here! > Also, the Mesa docs are easier for me > to update, support tables and LaTeX, have other bits of hw writeups on > the same page, etc... so they feel like a better home for the unabridged > version. And since Vivante did this, I figured it was ok. > > If people feel strongly I can of course inline it, it's just not clear > to me that dumping all that info into the header here is actually > desired. (And there's even more info Marge queued ... > https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33743/diffs?commit_id=5ddb57ceb46d42096a34cbef1df6b4109ac2b511 > ) I don't feel strongly about this btw, was just thinking out loud, and also lets you move asahi.html around (e.g. split into subpages) without having to worry about breaking old links. > > The one thing it's missing is an explicit description of how stride is > > computed and used. I can see the 'obvious' way to do it for this and > > compression, but it would be good to make it explicit, given some > > misadventures in the past. CCS is probably the gold standard to refer > > to here. > > Ah, right -- thanks for raising that! I'll add this for v3. Indeed, I > picked the "obvious" way, given said misadventures with AFBC ;) > > This is the relevant Mesa code: > > /* > * For WSI purposes, we need to associate a stride with all layouts. > * In the hardware, only strided linear images have an associated > * stride, there is no natural stride associated with twiddled > * images. However, various clients assert that the stride is valid > * for the image if it were linear (even if it is in fact not > * linear). In those cases, by convention we use the minimum valid > * such stride. > */ > static inline uint32_t > ail_get_wsi_stride_B(const struct ail_layout *layout, unsigned level) > { > assert(level == 0 && "Mipmaps cannot be shared as WSI"); > > if (layout->tiling == AIL_TILING_LINEAR) > return ail_get_linear_stride_B(layout, level); > else > return util_format_get_stride(layout->format, layout->width_px); > } > > I can either copy that comment here, or to make that logic more explicit: > > Producers must set the stride to the image width in > pixels times the bytes per pixel. This is a software convention, the > hardware does not use this stride. Hrm. I guess more in keeping with how it's used in other APIs, as well as more kind of future-proof in the sense of not needing possibly gen-specific rounding everywhere, would be to pass (n_tiles_h(buf) * tile_size_bytes) / n_rows_in_tile(format). That gives you the same information as you get for other users of stride, and might help make things more explicit for small tiles as well? Plus would apply pretty obviously to compression. I know it seems pretty inconsequential, but it does help for utils which e.g. dump and copy content. And makes sure no-one can make some dumb assumptions and get it wrong. Cheers, Daniel
diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h index e41a3cec6a9ed18760f3b0c88ba437c9aba3dd4f..8668c0275677bbc0a82a1028f122bacfb44a867b 100644 --- a/include/uapi/drm/drm_fourcc.h +++ b/include/uapi/drm/drm_fourcc.h @@ -422,6 +422,7 @@ extern "C" { #define DRM_FORMAT_MOD_VENDOR_ALLWINNER 0x09 #define DRM_FORMAT_MOD_VENDOR_AMLOGIC 0x0a #define DRM_FORMAT_MOD_VENDOR_MTK 0x0b +#define DRM_FORMAT_MOD_VENDOR_APPLE 0x0c /* add more to the end as needed */ @@ -1494,6 +1495,63 @@ drm_fourcc_canonicalize_nvidia_format_mod(__u64 modifier) /* alias for the most common tiling format */ #define DRM_FORMAT_MOD_MTK_16L_32S_TILE DRM_FORMAT_MOD_MTK(MTK_FMT_MOD_TILE_16L32S) +/* + * Apple GPU-tiled layout. + * + * GPU-tiled images are divided into tiles. Tiles are always 16KiB, with + * dimensions depending on the base-format. Within a tile, pixels are fully + * interleaved (Morton order). Tiles themselves are raster-order. + * + * Images must be 16-byte aligned. + * + * For more information see + * https://docs.mesa3d.org/drivers/asahi.html#image-layouts + * + * When lossless compression is impossible, this is the preferred layout. + */ +#define DRM_FORMAT_MOD_APPLE_GPU_TILED fourcc_mod_code(APPLE, 1) + +/* + * Apple compressed GPU-tiled layout. + * + * Compressed GPU-tiled images contain a body laid out like + * DRM_FORMAT_MOD_APPLE_GPU_TILED followed by a metadata section. + * + * The metadata section contains 8 bytes for each 16x16 compression subtile. The + * metadata section pads the image to power-of-two dimensions, and compression + * subtiles are interleaved (Morton order). By convention, the metadata + * immediately follows the body, after padding the body to 128-bytes. + * + * Images must be 16-byte aligned. + * + * This is the preferred layout. + */ +#define DRM_FORMAT_MOD_APPLE_GPU_TILED_COMPRESSED fourcc_mod_code(APPLE, 2) + +/* + * Apple twiddled layout. + * + * Twiddled images are padded to power-of-two dimensions, with pixels fully + * interleaved (Morton order). + * + * Images must be 16-byte aligned. + * + * GPU-tiling is preferred to twiddling. Twiddled images are mainly useful for + * sparse images, due to a limitation of the PBE unit. + */ +#define DRM_FORMAT_MOD_APPLE_TWIDDLED fourcc_mod_code(APPLE, 3) + +/* + * Apple compressed twiddled layout. + * + * Compressed twiddled images contain a body laid out like + * DRM_FORMAT_MOD_APPLE_TWIDDLED layout followed by metadata laid out like + * DRM_FORMAT_MOD_APPLE_GPU_TILED_COMPRESSED metadata. + * + * Images must be 16-byte aligned. + */ +#define DRM_FORMAT_MOD_APPLE_TWIDDLED_COMPRESSED fourcc_mod_code(APPLE, 4) + /* * AMD modifiers *
Apple GPUs support various non-linear image layouts. Add modifiers for these layouts. Mesa requires these modifiers to share non-linear buffers across processes, but no other userspace or kernel support is required/expected. These layouts are notably not used for interchange across hardware blocks (e.g. with the display controller). There are other layouts for that but we don't support them either in userspace or kernelspace yet (even downstream), so we don't add modifiers here. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> --- Changes in v2: - Rename "Twiddled" to "GPU-tiled" to match what I now believe is the canonical name. - Add modifiers for the actual "Twiddled" layouts. - Clarify that the body of compressed images are laid out like their uncompressed counterparts. - Link to v1: https://lore.kernel.org/r/20250218-apple-twiddled-modifiers-v1-1-8551bab4321f@rosenzweig.io --- include/uapi/drm/drm_fourcc.h | 58 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 58 insertions(+) --- base-commit: 0ed1356af8f629ae807963b7db4e501e3b580bc2 change-id: 20250218-apple-twiddled-modifiers-fde1a6f4300c Best regards,