diff mbox series

[v2] drm: add modifiers for Apple GPU layouts

Message ID 20250225-apple-twiddled-modifiers-v2-1-cf69729e87f6@rosenzweig.io (mailing list archive)
State New
Headers show
Series [v2] drm: add modifiers for Apple GPU layouts | expand

Commit Message

Alyssa Rosenzweig Feb. 25, 2025, 9:35 p.m. UTC
Apple GPUs support various non-linear image layouts. Add modifiers for
these layouts. Mesa requires these modifiers to share non-linear buffers
across processes, but no other userspace or kernel support is
required/expected.

These layouts are notably not used for interchange across hardware
blocks (e.g. with the display controller). There are other layouts for
that but we don't support them either in userspace or kernelspace yet
(even downstream), so we don't add modifiers here.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
---
Changes in v2:
- Rename "Twiddled" to "GPU-tiled" to match what I now believe is the canonical name.
- Add modifiers for the actual "Twiddled" layouts.
- Clarify that the body of compressed images are laid out like their
  uncompressed counterparts.
- Link to v1: https://lore.kernel.org/r/20250218-apple-twiddled-modifiers-v1-1-8551bab4321f@rosenzweig.io
---
 include/uapi/drm/drm_fourcc.h | 58 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 58 insertions(+)


---
base-commit: 0ed1356af8f629ae807963b7db4e501e3b580bc2
change-id: 20250218-apple-twiddled-modifiers-fde1a6f4300c

Best regards,

Comments

Daniel Stone Feb. 25, 2025, 9:59 p.m. UTC | #1
Hi,

On Tue, 25 Feb 2025 at 21:35, Alyssa Rosenzweig <alyssa@rosenzweig.io> wrote:
> These layouts are notably not used for interchange across hardware
> blocks (e.g. with the display controller). There are other layouts for
> that but we don't support them either in userspace or kernelspace yet
> (even downstream), so we don't add modifiers here.

Yeah, when those have users with good definitions matching these, we
can add them here, even if they are downstream. Anything the GPU would
share out of context, or the codec, would be good for this.

> +/*
> + * Apple GPU-tiled layout.
> + *
> + * GPU-tiled images are divided into tiles. Tiles are always 16KiB, with
> + * dimensions depending on the base-format. Within a tile, pixels are fully
> + * interleaved (Morton order). Tiles themselves are raster-order.
> + *
> + * Images must be 16-byte aligned.
> + *
> + * For more information see
> + * https://docs.mesa3d.org/drivers/asahi.html#image-layouts

This writeup is really nice. I would prefer it was inlined here though
(similar to AFBC), with Mesa then referring to the kernel, but tbf
Vivante does have a similar external reference.

The one thing it's missing is an explicit description of how stride is
computed and used. I can see the 'obvious' way to do it for this and
compression, but it would be good to make it explicit, given some
misadventures in the past. CCS is probably the gold standard to refer
to here.

I'd be very happy to merge it with that.

Cheers,
Daniel
Alyssa Rosenzweig Feb. 25, 2025, 10:18 p.m. UTC | #2
> > These layouts are notably not used for interchange across hardware
> > blocks (e.g. with the display controller). There are other layouts for
> > that but we don't support them either in userspace or kernelspace yet
> > (even downstream), so we don't add modifiers here.
> 
> Yeah, when those have users with good definitions matching these, we
> can add them here, even if they are downstream. Anything the GPU would
> share out of context, or the codec, would be good for this.

Sure. The mentioned "other layouts" are for scanout (GPU->display), and
apparently the GPU can render but not sample them... You can imagine
about how well that would go without a vendor extension + compositor
patches......

(Currently we use linear framebuffers for scanout and avoid that rat's
nest.)

> 
> > +/*
> > + * Apple GPU-tiled layout.
> > + *
> > + * GPU-tiled images are divided into tiles. Tiles are always 16KiB, with
> > + * dimensions depending on the base-format. Within a tile, pixels are fully
> > + * interleaved (Morton order). Tiles themselves are raster-order.
> > + *
> > + * Images must be 16-byte aligned.
> > + *
> > + * For more information see
> > + * https://docs.mesa3d.org/drivers/asahi.html#image-layouts
> 
> This writeup is really nice. I would prefer it was inlined here though
> (similar to AFBC), with Mesa then referring to the kernel, but tbf
> Vivante does have a similar external reference.

Thanks :-) I wasn't sure which way would be better. Most of the
complexity in that writeup relates to mipmapping and arrayed images,
which I don't think WSI hits...? Also, the Mesa docs are easier for me
to update, support tables and LaTeX, have other bits of hw writeups on
the same page, etc... so they feel like a better home for the unabridged
version.  And since Vivante did this, I figured it was ok.

If people feel strongly I can of course inline it, it's just not clear
to me that dumping all that info into the header here is actually
desired. (And there's even more info Marge queued ...
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33743/diffs?commit_id=5ddb57ceb46d42096a34cbef1df6b4109ac2b511
)

> The one thing it's missing is an explicit description of how stride is
> computed and used. I can see the 'obvious' way to do it for this and
> compression, but it would be good to make it explicit, given some
> misadventures in the past. CCS is probably the gold standard to refer
> to here.

Ah, right -- thanks for raising that! I'll add this for v3. Indeed, I
picked the "obvious" way, given said misadventures with AFBC ;)

This is the relevant Mesa code:

   /*
    * For WSI purposes, we need to associate a stride with all layouts.
    * In the hardware, only strided linear images have an associated
    * stride, there is no natural stride associated with twiddled
    * images.  However, various clients assert that the stride is valid
    * for the image if it were linear (even if it is in fact not
    * linear). In those cases, by convention we use the minimum valid
    * such stride.
    */
   static inline uint32_t
   ail_get_wsi_stride_B(const struct ail_layout *layout, unsigned level)
   {
      assert(level == 0 && "Mipmaps cannot be shared as WSI");
   
      if (layout->tiling == AIL_TILING_LINEAR)
         return ail_get_linear_stride_B(layout, level);
      else
         return util_format_get_stride(layout->format, layout->width_px);
   }

I can either copy that comment here, or to make that logic more explicit:

    Producers must set the stride to the image width in
    pixels times the bytes per pixel. This is a software convention, the
    hardware does not use this stride.

Thanks,

Alyssa
Daniel Stone Feb. 26, 2025, 1:55 p.m. UTC | #3
Hey,

On Tue, 25 Feb 2025 at 22:18, Alyssa Rosenzweig <alyssa@rosenzweig.io> wrote:
> > > These layouts are notably not used for interchange across hardware
> > > blocks (e.g. with the display controller). There are other layouts for
> > > that but we don't support them either in userspace or kernelspace yet
> > > (even downstream), so we don't add modifiers here.
> >
> > Yeah, when those have users with good definitions matching these, we
> > can add them here, even if they are downstream. Anything the GPU would
> > share out of context, or the codec, would be good for this.
>
> Sure. The mentioned "other layouts" are for scanout (GPU->display), and
> apparently the GPU can render but not sample them... You can imagine
> about how well that would go without a vendor extension + compositor
> patches......
>
> (Currently we use linear framebuffers for scanout and avoid that rat's
> nest.)

Heh, impressive. Are those the twiddled-but-not-tiled ones?

> > > +/*
> > > + * Apple GPU-tiled layout.
> > > + *
> > > + * GPU-tiled images are divided into tiles. Tiles are always 16KiB, with
> > > + * dimensions depending on the base-format. Within a tile, pixels are fully
> > > + * interleaved (Morton order). Tiles themselves are raster-order.
> > > + *
> > > + * Images must be 16-byte aligned.
> > > + *
> > > + * For more information see
> > > + * https://docs.mesa3d.org/drivers/asahi.html#image-layouts
> >
> > This writeup is really nice. I would prefer it was inlined here though
> > (similar to AFBC), with Mesa then referring to the kernel, but tbf
> > Vivante does have a similar external reference.
>
> Thanks :-) I wasn't sure which way would be better. Most of the
> complexity in that writeup relates to mipmapping and arrayed images,
> which I don't think WSI hits...?

Yeah, anything that wouldn't be exported out of a GPU API context
doesn't need to be in here!

> Also, the Mesa docs are easier for me
> to update, support tables and LaTeX, have other bits of hw writeups on
> the same page, etc... so they feel like a better home for the unabridged
> version.  And since Vivante did this, I figured it was ok.
>
> If people feel strongly I can of course inline it, it's just not clear
> to me that dumping all that info into the header here is actually
> desired. (And there's even more info Marge queued ...
> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33743/diffs?commit_id=5ddb57ceb46d42096a34cbef1df6b4109ac2b511
> )

I don't feel strongly about this btw, was just thinking out loud, and
also lets you move asahi.html around (e.g. split into subpages)
without having to worry about breaking old links.

> > The one thing it's missing is an explicit description of how stride is
> > computed and used. I can see the 'obvious' way to do it for this and
> > compression, but it would be good to make it explicit, given some
> > misadventures in the past. CCS is probably the gold standard to refer
> > to here.
>
> Ah, right -- thanks for raising that! I'll add this for v3. Indeed, I
> picked the "obvious" way, given said misadventures with AFBC ;)
>
> This is the relevant Mesa code:
>
>    /*
>     * For WSI purposes, we need to associate a stride with all layouts.
>     * In the hardware, only strided linear images have an associated
>     * stride, there is no natural stride associated with twiddled
>     * images.  However, various clients assert that the stride is valid
>     * for the image if it were linear (even if it is in fact not
>     * linear). In those cases, by convention we use the minimum valid
>     * such stride.
>     */
>    static inline uint32_t
>    ail_get_wsi_stride_B(const struct ail_layout *layout, unsigned level)
>    {
>       assert(level == 0 && "Mipmaps cannot be shared as WSI");
>
>       if (layout->tiling == AIL_TILING_LINEAR)
>          return ail_get_linear_stride_B(layout, level);
>       else
>          return util_format_get_stride(layout->format, layout->width_px);
>    }
>
> I can either copy that comment here, or to make that logic more explicit:
>
>     Producers must set the stride to the image width in
>     pixels times the bytes per pixel. This is a software convention, the
>     hardware does not use this stride.

Hrm. I guess more in keeping with how it's used in other APIs, as well
as more kind of future-proof in the sense of not needing possibly
gen-specific rounding everywhere, would be to pass (n_tiles_h(buf) *
tile_size_bytes) / n_rows_in_tile(format). That gives you the same
information as you get for other users of stride, and might help make
things more explicit for small tiles as well? Plus would apply pretty
obviously to compression.

I know it seems pretty inconsequential, but it does help for utils
which e.g. dump and copy content. And makes sure no-one can make some
dumb assumptions and get it wrong.

Cheers,
Daniel
diff mbox series

Patch

diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h
index e41a3cec6a9ed18760f3b0c88ba437c9aba3dd4f..8668c0275677bbc0a82a1028f122bacfb44a867b 100644
--- a/include/uapi/drm/drm_fourcc.h
+++ b/include/uapi/drm/drm_fourcc.h
@@ -422,6 +422,7 @@  extern "C" {
 #define DRM_FORMAT_MOD_VENDOR_ALLWINNER 0x09
 #define DRM_FORMAT_MOD_VENDOR_AMLOGIC 0x0a
 #define DRM_FORMAT_MOD_VENDOR_MTK     0x0b
+#define DRM_FORMAT_MOD_VENDOR_APPLE   0x0c
 
 /* add more to the end as needed */
 
@@ -1494,6 +1495,63 @@  drm_fourcc_canonicalize_nvidia_format_mod(__u64 modifier)
 /* alias for the most common tiling format */
 #define DRM_FORMAT_MOD_MTK_16L_32S_TILE  DRM_FORMAT_MOD_MTK(MTK_FMT_MOD_TILE_16L32S)
 
+/*
+ * Apple GPU-tiled layout.
+ *
+ * GPU-tiled images are divided into tiles. Tiles are always 16KiB, with
+ * dimensions depending on the base-format. Within a tile, pixels are fully
+ * interleaved (Morton order). Tiles themselves are raster-order.
+ *
+ * Images must be 16-byte aligned.
+ *
+ * For more information see
+ * https://docs.mesa3d.org/drivers/asahi.html#image-layouts
+ *
+ * When lossless compression is impossible, this is the preferred layout.
+ */
+#define DRM_FORMAT_MOD_APPLE_GPU_TILED fourcc_mod_code(APPLE, 1)
+
+/*
+ * Apple compressed GPU-tiled layout.
+ *
+ * Compressed GPU-tiled images contain a body laid out like
+ * DRM_FORMAT_MOD_APPLE_GPU_TILED followed by a metadata section.
+ *
+ * The metadata section contains 8 bytes for each 16x16 compression subtile. The
+ * metadata section pads the image to power-of-two dimensions, and compression
+ * subtiles are interleaved (Morton order). By convention, the metadata
+ * immediately follows the body, after padding the body to 128-bytes.
+ *
+ * Images must be 16-byte aligned.
+ *
+ * This is the preferred layout.
+ */
+#define DRM_FORMAT_MOD_APPLE_GPU_TILED_COMPRESSED fourcc_mod_code(APPLE, 2)
+
+/*
+ * Apple twiddled layout.
+ *
+ * Twiddled images are padded to power-of-two dimensions, with pixels fully
+ * interleaved (Morton order).
+ *
+ * Images must be 16-byte aligned.
+ *
+ * GPU-tiling is preferred to twiddling. Twiddled images are mainly useful for
+ * sparse images, due to a limitation of the PBE unit.
+ */
+#define DRM_FORMAT_MOD_APPLE_TWIDDLED fourcc_mod_code(APPLE, 3)
+
+/*
+ * Apple compressed twiddled layout.
+ *
+ * Compressed twiddled images contain a body laid out like
+ * DRM_FORMAT_MOD_APPLE_TWIDDLED layout followed by metadata laid out like
+ * DRM_FORMAT_MOD_APPLE_GPU_TILED_COMPRESSED metadata.
+ *
+ * Images must be 16-byte aligned.
+ */
+#define DRM_FORMAT_MOD_APPLE_TWIDDLED_COMPRESSED fourcc_mod_code(APPLE, 4)
+
 /*
  * AMD modifiers
  *