diff mbox series

[v2,2/7] drm/vkms: Add support for multy-planar framebuffers

Message ID 20240110-vkms-yuv-v2-2-952fcaa5a193@riseup.net (mailing list archive)
State New, archived
Headers show
Series Add YUV formats to VKMS | expand

Commit Message

Arthur Grillo Jan. 10, 2024, 5:44 p.m. UTC
Add support to multy-planar formats by adding an index argument to the
framebuffer data access functions.

Also, give all the planes to the conversion functions. This, for now,
should be noop, as all the supported formats have only one plane.

Signed-off-by: Arthur Grillo <arthurgrillo@riseup.net>
---
 drivers/gpu/drm/vkms/vkms_drv.h     |  2 +-
 drivers/gpu/drm/vkms/vkms_formats.c | 73 +++++++++++++++++++++----------------
 2 files changed, 42 insertions(+), 33 deletions(-)

Comments

Louis Chauvet Feb. 1, 2024, 5:38 p.m. UTC | #1
Hello,

I think there are some bugs in this implementation of multi-planar 
support (and not mylty-planar, there is a typo in the title), especially 
for the "new" drm_format_info description which uses block_w and block_h.

I will propose two patches [1] solving these issues and hopefully also 
simplifying a bit the composition.

TBH I'm not an expert, it's my first ever contribution in DRM, so please 
don't hesitate to correct me if you thin I missunderstood something, it 
actually took me a bit of time to fully understand the whole series and 
how it interacted with the rest of the vkms driver.

> -static size_t pixel_offset(const struct vkms_frame_info *frame_info, int x, int y)
> +static size_t pixel_offset(const struct vkms_frame_info *frame_info, int x, int y, size_t index)
>  {
>  	struct drm_framebuffer *fb = frame_info->fb;
>  
> -	return fb->offsets[0] + (y * fb->pitches[0])
> -			      + (x * fb->format->cpp[0]);
> +	return fb->offsets[index] + (y * fb->pitches[index])
> +				  + (x * fb->format->cpp[index]);
>  }
>  
>  /*
> @@ -23,27 +23,25 @@ static size_t pixel_offset(const struct vkms_frame_info *frame_info, int x, int
>   * @frame_info: Buffer metadata
>   * @x: The x(width) coordinate of the 2D buffer
>   * @y: The y(Heigth) coordinate of the 2D buffer
> + * @index: The index of the plane on the 2D buffer
>   *
>   * Takes the information stored in the frame_info, a pair of coordinates, and
> - * returns the address of the first color channel.
> - * This function assumes the channels are packed together, i.e. a color channel
> - * comes immediately after another in the memory. And therefore, this function
> - * doesn't work for YUV with chroma subsampling (e.g. YUV420 and NV21).
> + * returns the address of the first color channel on the desired index.
>   */
>  static void *packed_pixels_addr(const struct vkms_frame_info *frame_info,
> -				int x, int y)
> +				int x, int y, size_t index)
>  {
> -	size_t offset = pixel_offset(frame_info, x, y);
> +	size_t offset = pixel_offset(frame_info, x, y, index);
>  
>  	return (u8 *)frame_info->map[0].vaddr + offset;
>  }

This implementation of packed_pixels_addr will only work with
block_w == block_h == 1. For packed or tiled formats we will need to use
x/y information to extract the correct address, and this address will not 
be a single pixel. See below my explanation.

[...]

> @@ -130,17 +128,28 @@ void vkms_compose_row(struct line_buffer *stage_buffer, struct vkms_plane_state
>  {
>  	struct pixel_argb_u16 *out_pixels = stage_buffer->pixels;
>  	struct vkms_frame_info *frame_info = plane->frame_info;
> -	u8 *src_pixels = get_packed_src_addr(frame_info, y);
> +	const struct drm_format_info *frame_format = frame_info->fb->format;
>  	int limit = min_t(size_t, drm_rect_width(&frame_info->dst), stage_buffer->n_pixels);
> +	u8 *src_pixels[DRM_FORMAT_MAX_PLANES];
>  
> -	for (size_t x = 0; x < limit; x++, src_pixels += frame_info->fb->format->cpp[0]) {
> +	for (size_t i = 0; i < frame_format->num_planes; i++)
> +		src_pixels[i] = get_packed_src_addr(frame_info, y, i);
> +
> +	for (size_t x = 0; x < limit; x++) {
>  		int x_pos = get_x_position(frame_info, limit, x);
>  
> -		if (drm_rotation_90_or_270(frame_info->rotation))
> -			src_pixels = get_packed_src_addr(frame_info, x + frame_info->rotated.y1)
> -				+ frame_info->fb->format->cpp[0] * y;
> +		if (drm_rotation_90_or_270(frame_info->rotation)) {
> +			for (size_t i = 0; i < frame_format->num_planes; i++) {
> +				src_pixels[i] = get_packed_src_addr(frame_info,
> +								    x + frame_info->rotated.y1, i);
> +				src_pixels[i] += frame_format->cpp[i] * y;

I find the current rotation management a bit complex to understand. This 
is not related to your patch, but as I had to understand this to create my 
second patch, I think this could be significanlty simplified.

Please see the below comment about frame_format->cpp, it applies here too. 
I think the "easy" way here is simply to reuse the method 
get_packed_src_addr every time you need a new pixel.

> +			}
> +		}
>  
>		plane->pixel_read(src_pixels, &out_pixels[x_pos]);
> +

The usage of cpp and pointer to specific pixel only work for non-packed 
and non-blocked pixels, but for example NV30 or Y0L0 need more 
informations about the exact location of the pixel to convert and write 
the correct pixel value (each pixel can't be referenced directly by a 
pointer). For example NV30 uses 5 bytes to store 3 pixels (10 bits each), 
so to access the "middle" one you need to read the 5 bytes and do a small 
computation to extract it's value.

I think a simple solution to handle most cases would be to profide two 
more parameters: the x and y positions of the pixel to copy, using 
"absolute coordinates" (i.e x=0,y=0 means the first byte of the src 
buffer, not the first pixel in the `drm_rect src`, this way the method 
`pixel_read` can extract the correct value).

This way it become easy to manage "complex" pixel representations in this 
loop: simply increment x/y and let the pixel_read method handle 
everything.

The second patch I will send is doing this. And as explained before, it 
will also simplify a lot the code related to rotation and translation (no 
more switch case everywhere to add offset to x/y, it simply use drm_rect_* 
helpers).

It's not optimal in term of performance (in some situation it will read 
the same block multiple time to generate different pixels), but I 
believe it still is an intersting trade-off.

In the future, if performance is actally critical, the whole composition 
loop will have to be specialized for each pixel formats: some can be 
treated line by line (as it's done today), but with blocks or packed 
pixels it's more complex.

> +		for (size_t i = 0; i < frame_format->num_planes; i++)
> +			src_pixels[i] += frame_format->cpp[i];

This is likely working with format with block_w != 1, see explanation 
above.

[...]

[1]: https://lore.kernel.org/dri-devel/20240201-yuv-v1-0-3ca376f27632@bootlin.com/T/#t
Arthur Grillo Feb. 2, 2024, 6:49 p.m. UTC | #2
On 01/02/24 14:38, Louis Chauvet wrote:
>>  
>>  /*
>> @@ -23,27 +23,25 @@ static size_t pixel_offset(const struct vkms_frame_info *frame_info, int x, int
>>   * @frame_info: Buffer metadata
>>   * @x: The x(width) coordinate of the 2D buffer
>>   * @y: The y(Heigth) coordinate of the 2D buffer
>> + * @index: The index of the plane on the 2D buffer
>>   *
>>   * Takes the information stored in the frame_info, a pair of coordinates, and
>> - * returns the address of the first color channel.
>> - * This function assumes the channels are packed together, i.e. a color channel
>> - * comes immediately after another in the memory. And therefore, this function
>> - * doesn't work for YUV with chroma subsampling (e.g. YUV420 and NV21).
>> + * returns the address of the first color channel on the desired index.
>>   */
>>  static void *packed_pixels_addr(const struct vkms_frame_info *frame_info,
>> -				int x, int y)
>> +				int x, int y, size_t index)
>>  {
>> -	size_t offset = pixel_offset(frame_info, x, y);
>> +	size_t offset = pixel_offset(frame_info, x, y, index);
>>  
>>  	return (u8 *)frame_info->map[0].vaddr + offset;
>>  }
> 
> This implementation of packed_pixels_addr will only work with
> block_w == block_h == 1. For packed or tiled formats we will need to use
> x/y information to extract the correct address, and this address will not 
> be a single pixel. See below my explanation.

You're right, currently, VKMS only supports non-packed/tiled formats. As
all the formats I plan to add are too not packed or tiled, I haven't
added support to it. But if you want to add it, please do :).

>> @@ -130,17 +128,28 @@ void vkms_compose_row(struct line_buffer *stage_buffer, struct vkms_plane_state
>>  {
>>  	struct pixel_argb_u16 *out_pixels = stage_buffer->pixels;
>>  	struct vkms_frame_info *frame_info = plane->frame_info;
>> -	u8 *src_pixels = get_packed_src_addr(frame_info, y);
>> +	const struct drm_format_info *frame_format = frame_info->fb->format;
>>  	int limit = min_t(size_t, drm_rect_width(&frame_info->dst), stage_buffer->n_pixels);
>> +	u8 *src_pixels[DRM_FORMAT_MAX_PLANES];
>>  
>> -	for (size_t x = 0; x < limit; x++, src_pixels += frame_info->fb->format->cpp[0]) {
>> +	for (size_t i = 0; i < frame_format->num_planes; i++)
>> +		src_pixels[i] = get_packed_src_addr(frame_info, y, i);
>> +
>> +	for (size_t x = 0; x < limit; x++) {
>>  		int x_pos = get_x_position(frame_info, limit, x);
>>  
>> -		if (drm_rotation_90_or_270(frame_info->rotation))
>> -			src_pixels = get_packed_src_addr(frame_info, x + frame_info->rotated.y1)
>> -				+ frame_info->fb->format->cpp[0] * y;
>> +		if (drm_rotation_90_or_270(frame_info->rotation)) {
>> +			for (size_t i = 0; i < frame_format->num_planes; i++) {
>> +				src_pixels[i] = get_packed_src_addr(frame_info,
>> +								    x + frame_info->rotated.y1, i);
>> +				src_pixels[i] += frame_format->cpp[i] * y;
> 
> I find the current rotation management a bit complex to understand. This 
> is not related to your patch, but as I had to understand this to create my 
> second patch, I think this could be significanlty simplified.

I also found the rotation logic complex when implementing this. I would
appreciate it if it were simplified.

> 
> Please see the below comment about frame_format->cpp, it applies here too. 
> I think the "easy" way here is simply to reuse the method 
> get_packed_src_addr every time you need a new pixel.
> 
>> +			}
>> +		}
>>  
>> 		plane->pixel_read(src_pixels, &out_pixels[x_pos]);
>> +
> 
> The usage of cpp and pointer to specific pixel only work for non-packed 
> and non-blocked pixels, but for example NV30 or Y0L0 need more 
> informations about the exact location of the pixel to convert and write 
> the correct pixel value (each pixel can't be referenced directly by a 
> pointer). For example NV30 uses 5 bytes to store 3 pixels (10 bits each), 
> so to access the "middle" one you need to read the 5 bytes and do a small 
> computation to extract it's value.

Great explanation, I can see what is the problem here.

> 
> I think a simple solution to handle most cases would be to profide two 
> more parameters: the x and y positions of the pixel to copy, using 
> "absolute coordinates" (i.e x=0,y=0 means the first byte of the src 
> buffer, not the first pixel in the `drm_rect src`, this way the method 
> `pixel_read` can extract the correct value).
> 
> This way it become easy to manage "complex" pixel representations in this 
> loop: simply increment x/y and let the pixel_read method handle 
> everything.
> 
> The second patch I will send is doing this. And as explained before, it 
> will also simplify a lot the code related to rotation and translation (no 
> more switch case everywhere to add offset to x/y, it simply use drm_rect_* 
> helpers).

I like this, expect my review soon :).

> 
> It's not optimal in term of performance (in some situation it will read 
> the same block multiple time to generate different pixels), but I 
> believe it still is an intersting trade-off.
> 
> In the future, if performance is actally critical, the whole composition 
> loop will have to be specialized for each pixel formats: some can be 
> treated line by line (as it's done today), but with blocks or packed 
> pixels it's more complex.
> 
>> +		for (size_t i = 0; i < frame_format->num_planes; i++)
>> +			src_pixels[i] += frame_format->cpp[i];
> 
> This is likely working with format with block_w != 1, see explanation 
> above.

I think you meant that is _not_ working. Yeah, as I already explained,
it was never my plan to add support for packed or tiled formats.

Best Regards,
~Arthur Grillo
diff mbox series

Patch

diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
index b4b357447292..c38590562e4b 100644
--- a/drivers/gpu/drm/vkms/vkms_drv.h
+++ b/drivers/gpu/drm/vkms/vkms_drv.h
@@ -56,7 +56,7 @@  struct vkms_writeback_job {
 struct vkms_plane_state {
 	struct drm_shadow_plane_state base;
 	struct vkms_frame_info *frame_info;
-	void (*pixel_read)(u8 *src_buffer, struct pixel_argb_u16 *out_pixel);
+	void (*pixel_read)(u8 **src_buffer, struct pixel_argb_u16 *out_pixel);
 };
 
 struct vkms_plane {
diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
index 172830a3936a..5566a7cd7bb4 100644
--- a/drivers/gpu/drm/vkms/vkms_formats.c
+++ b/drivers/gpu/drm/vkms/vkms_formats.c
@@ -9,12 +9,12 @@ 
 
 #include "vkms_formats.h"
 
-static size_t pixel_offset(const struct vkms_frame_info *frame_info, int x, int y)
+static size_t pixel_offset(const struct vkms_frame_info *frame_info, int x, int y, size_t index)
 {
 	struct drm_framebuffer *fb = frame_info->fb;
 
-	return fb->offsets[0] + (y * fb->pitches[0])
-			      + (x * fb->format->cpp[0]);
+	return fb->offsets[index] + (y * fb->pitches[index])
+				  + (x * fb->format->cpp[index]);
 }
 
 /*
@@ -23,27 +23,25 @@  static size_t pixel_offset(const struct vkms_frame_info *frame_info, int x, int
  * @frame_info: Buffer metadata
  * @x: The x(width) coordinate of the 2D buffer
  * @y: The y(Heigth) coordinate of the 2D buffer
+ * @index: The index of the plane on the 2D buffer
  *
  * Takes the information stored in the frame_info, a pair of coordinates, and
- * returns the address of the first color channel.
- * This function assumes the channels are packed together, i.e. a color channel
- * comes immediately after another in the memory. And therefore, this function
- * doesn't work for YUV with chroma subsampling (e.g. YUV420 and NV21).
+ * returns the address of the first color channel on the desired index.
  */
 static void *packed_pixels_addr(const struct vkms_frame_info *frame_info,
-				int x, int y)
+				int x, int y, size_t index)
 {
-	size_t offset = pixel_offset(frame_info, x, y);
+	size_t offset = pixel_offset(frame_info, x, y, index);
 
 	return (u8 *)frame_info->map[0].vaddr + offset;
 }
 
-static void *get_packed_src_addr(const struct vkms_frame_info *frame_info, int y)
+static void *get_packed_src_addr(const struct vkms_frame_info *frame_info, int y, size_t index)
 {
 	int x_src = frame_info->src.x1 >> 16;
 	int y_src = y - frame_info->rotated.y1 + (frame_info->src.y1 >> 16);
 
-	return packed_pixels_addr(frame_info, x_src, y_src);
+	return packed_pixels_addr(frame_info, x_src, y_src, index);
 }
 
 static int get_x_position(const struct vkms_frame_info *frame_info, int limit, int x)
@@ -53,7 +51,7 @@  static int get_x_position(const struct vkms_frame_info *frame_info, int limit, i
 	return x;
 }
 
-static void ARGB8888_to_argb_u16(u8 *src_pixels, struct pixel_argb_u16 *out_pixel)
+static void ARGB8888_to_argb_u16(u8 **src_pixels, struct pixel_argb_u16 *out_pixel)
 {
 	/*
 	 * The 257 is the "conversion ratio". This number is obtained by the
@@ -61,23 +59,23 @@  static void ARGB8888_to_argb_u16(u8 *src_pixels, struct pixel_argb_u16 *out_pixe
 	 * the best color value in a pixel format with more possibilities.
 	 * A similar idea applies to others RGB color conversions.
 	 */
-	out_pixel->a = (u16)src_pixels[3] * 257;
-	out_pixel->r = (u16)src_pixels[2] * 257;
-	out_pixel->g = (u16)src_pixels[1] * 257;
-	out_pixel->b = (u16)src_pixels[0] * 257;
+	out_pixel->a = (u16)src_pixels[0][3] * 257;
+	out_pixel->r = (u16)src_pixels[0][2] * 257;
+	out_pixel->g = (u16)src_pixels[0][1] * 257;
+	out_pixel->b = (u16)src_pixels[0][0] * 257;
 }
 
-static void XRGB8888_to_argb_u16(u8 *src_pixels, struct pixel_argb_u16 *out_pixel)
+static void XRGB8888_to_argb_u16(u8 **src_pixels, struct pixel_argb_u16 *out_pixel)
 {
 	out_pixel->a = (u16)0xffff;
-	out_pixel->r = (u16)src_pixels[2] * 257;
-	out_pixel->g = (u16)src_pixels[1] * 257;
-	out_pixel->b = (u16)src_pixels[0] * 257;
+	out_pixel->r = (u16)src_pixels[0][2] * 257;
+	out_pixel->g = (u16)src_pixels[0][1] * 257;
+	out_pixel->b = (u16)src_pixels[0][0] * 257;
 }
 
-static void ARGB16161616_to_argb_u16(u8 *src_pixels, struct pixel_argb_u16 *out_pixel)
+static void ARGB16161616_to_argb_u16(u8 **src_pixels, struct pixel_argb_u16 *out_pixel)
 {
-	u16 *pixels = (u16 *)src_pixels;
+	u16 *pixels = (u16 *)src_pixels[0];
 
 	out_pixel->a = le16_to_cpu(pixels[3]);
 	out_pixel->r = le16_to_cpu(pixels[2]);
@@ -85,9 +83,9 @@  static void ARGB16161616_to_argb_u16(u8 *src_pixels, struct pixel_argb_u16 *out_
 	out_pixel->b = le16_to_cpu(pixels[0]);
 }
 
-static void XRGB16161616_to_argb_u16(u8 *src_pixels, struct pixel_argb_u16 *out_pixel)
+static void XRGB16161616_to_argb_u16(u8 **src_pixels, struct pixel_argb_u16 *out_pixel)
 {
-	u16 *pixels = (u16 *)src_pixels;
+	u16 *pixels = (u16 *)src_pixels[0];
 
 	out_pixel->a = (u16)0xffff;
 	out_pixel->r = le16_to_cpu(pixels[2]);
@@ -95,9 +93,9 @@  static void XRGB16161616_to_argb_u16(u8 *src_pixels, struct pixel_argb_u16 *out_
 	out_pixel->b = le16_to_cpu(pixels[0]);
 }
 
-static void RGB565_to_argb_u16(u8 *src_pixels, struct pixel_argb_u16 *out_pixel)
+static void RGB565_to_argb_u16(u8 **src_pixels, struct pixel_argb_u16 *out_pixel)
 {
-	u16 *pixels = (u16 *)src_pixels;
+	u16 *pixels = (u16 *)src_pixels[0];
 
 	s64 fp_rb_ratio = drm_fixp_div(drm_int2fixp(65535), drm_int2fixp(31));
 	s64 fp_g_ratio = drm_fixp_div(drm_int2fixp(65535), drm_int2fixp(63));
@@ -130,17 +128,28 @@  void vkms_compose_row(struct line_buffer *stage_buffer, struct vkms_plane_state
 {
 	struct pixel_argb_u16 *out_pixels = stage_buffer->pixels;
 	struct vkms_frame_info *frame_info = plane->frame_info;
-	u8 *src_pixels = get_packed_src_addr(frame_info, y);
+	const struct drm_format_info *frame_format = frame_info->fb->format;
 	int limit = min_t(size_t, drm_rect_width(&frame_info->dst), stage_buffer->n_pixels);
+	u8 *src_pixels[DRM_FORMAT_MAX_PLANES];
 
-	for (size_t x = 0; x < limit; x++, src_pixels += frame_info->fb->format->cpp[0]) {
+	for (size_t i = 0; i < frame_format->num_planes; i++)
+		src_pixels[i] = get_packed_src_addr(frame_info, y, i);
+
+	for (size_t x = 0; x < limit; x++) {
 		int x_pos = get_x_position(frame_info, limit, x);
 
-		if (drm_rotation_90_or_270(frame_info->rotation))
-			src_pixels = get_packed_src_addr(frame_info, x + frame_info->rotated.y1)
-				+ frame_info->fb->format->cpp[0] * y;
+		if (drm_rotation_90_or_270(frame_info->rotation)) {
+			for (size_t i = 0; i < frame_format->num_planes; i++) {
+				src_pixels[i] = get_packed_src_addr(frame_info,
+								    x + frame_info->rotated.y1, i);
+				src_pixels[i] += frame_format->cpp[i] * y;
+			}
+		}
 
 		plane->pixel_read(src_pixels, &out_pixels[x_pos]);
+
+		for (size_t i = 0; i < frame_format->num_planes; i++)
+			src_pixels[i] += frame_format->cpp[i];
 	}
 }
 
@@ -221,7 +230,7 @@  void vkms_writeback_row(struct vkms_writeback_job *wb,
 {
 	struct vkms_frame_info *frame_info = &wb->wb_frame_info;
 	int x_dst = frame_info->dst.x1;
-	u8 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y);
+	u8 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y, 0);
 	struct pixel_argb_u16 *in_pixels = src_buffer->pixels;
 	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst), src_buffer->n_pixels);