diff mbox series

clk: zynqmp: Work around broken DT GPU node

Message ID 20241031170015.55243-1-marex@denx.de (mailing list archive)
State New
Headers show
Series clk: zynqmp: Work around broken DT GPU node | expand

Commit Message

Marek Vasut Oct. 31, 2024, 4:59 p.m. UTC
The ZynqMP DT GPU node clock description is wrong and does not represent
the hardware correctly, it only describes BUS and PP0 clock, while it is
missing PP1 clock. That means PP1 clock can never be enabled when the GPU
should be used, which leads to expected GPU hang even with simple basic
tests like kmscube.

Since Xilinx does use generated DTs on ZynqMP, the current broken DT
implementation has to be supported. Add a workaround for this breakage
into the clock driver, in case of PP0 enablement attempt, enable PP1
as well and vice versa. This way, the GPU does work and does not hang
because one of its pixel pipeline clock are not enabled.

Signed-off-by: Marek Vasut <marex@denx.de>
---
Cc: Michael Turquette <mturquette@baylibre.com>
Cc: Michal Simek <michal.simek@amd.com>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-clk@vger.kernel.org
---
 drivers/clk/zynqmp/clk-gate-zynqmp.c | 17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

Comments

Sagar, Vishal Nov. 11, 2024, 2:33 p.m. UTC | #1
Hi Marek,

Thanks for sharing this patch.

On 10/31/2024 5:59 PM, Marek Vasut wrote:
> The ZynqMP DT GPU node clock description is wrong and does not represent
> the hardware correctly, it only describes BUS and PP0 clock, while it is
> missing PP1 clock. That means PP1 clock can never be enabled when the GPU
> should be used, which leads to expected GPU hang even with simple basic
> tests like kmscube.

Could you please share how you tested this?
Please share the dt node too.
We will also check at our end and revert for this.

> 
> Since Xilinx does use generated DTs on ZynqMP, the current broken DT
> implementation has to be supported. Add a workaround for this breakage
> into the clock driver, in case of PP0 enablement attempt, enable PP1
> as well and vice versa. This way, the GPU does work and does not hang
> because one of its pixel pipeline clock are not enabled.
> 
> Signed-off-by: Marek Vasut <marex@denx.de>
> ---
> Cc: Michael Turquette <mturquette@baylibre.com>
> Cc: Michal Simek <michal.simek@amd.com>
> Cc: Stephen Boyd <sboyd@kernel.org>
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-clk@vger.kernel.org
> ---
>   drivers/clk/zynqmp/clk-gate-zynqmp.c | 17 +++++++++++++++--
>   1 file changed, 15 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/clk/zynqmp/clk-gate-zynqmp.c b/drivers/clk/zynqmp/clk-gate-zynqmp.c
> index b89e557371984..b013aa33e7abb 100644
> --- a/drivers/clk/zynqmp/clk-gate-zynqmp.c
> +++ b/drivers/clk/zynqmp/clk-gate-zynqmp.c
> @@ -7,6 +7,7 @@
>    * Gated clock implementation
>    */
>   
> +#include <dt-bindings/clock/xlnx-zynqmp-clk.h>
>   #include <linux/clk-provider.h>
>   #include <linux/slab.h>
>   #include "clk-zynqmp.h"
> @@ -38,7 +39,13 @@ static int zynqmp_clk_gate_enable(struct clk_hw *hw)
>   	u32 clk_id = gate->clk_id;
>   	int ret;
>   
> -	ret = zynqmp_pm_clock_enable(clk_id);
> +	if (clk_id == GPU_PP0_REF || clk_id == GPU_PP1_REF) {
> +		ret = zynqmp_pm_clock_enable(GPU_PP0_REF);
> +		if (!ret)
> +			ret = zynqmp_pm_clock_enable(GPU_PP1_REF);
> +	} else {
> +		ret = zynqmp_pm_clock_enable(clk_id);
> +	}
>   
>   	if (ret)
>   		pr_debug("%s() clock enable failed for %s (id %d), ret = %d\n",
> @@ -58,7 +65,13 @@ static void zynqmp_clk_gate_disable(struct clk_hw *hw)
>   	u32 clk_id = gate->clk_id;
>   	int ret;
>   
> -	ret = zynqmp_pm_clock_disable(clk_id);
> +	if (clk_id == GPU_PP0_REF || clk_id == GPU_PP1_REF) {
> +		ret = zynqmp_pm_clock_disable(GPU_PP1_REF);
> +		if (!ret)
> +			ret = zynqmp_pm_clock_disable(GPU_PP0_REF);
> +	} else {
> +		ret = zynqmp_pm_clock_disable(clk_id);
> +	}
>   
>   	if (ret)
>   		pr_debug("%s() clock disable failed for %s (id %d), ret = %d\n",



Regards
Vishal Sagar
Marek Vasut Nov. 11, 2024, 4:25 p.m. UTC | #2
On 11/11/24 3:33 PM, Sagar, Vishal wrote:
> Hi Marek,
> 
> Thanks for sharing this patch.
> 
> On 10/31/2024 5:59 PM, Marek Vasut wrote:
>> The ZynqMP DT GPU node clock description is wrong and does not represent
>> the hardware correctly, it only describes BUS and PP0 clock, while it is
>> missing PP1 clock. That means PP1 clock can never be enabled when the GPU
>> should be used, which leads to expected GPU hang even with simple basic
>> tests like kmscube.
> 
> Could you please share how you tested this?

I tested this by running kmscube, see one line above.

> Please share the dt node too.

The GPU DT node is already in arch/arm64/boot/dts/xilinx/zynqmp.dtsi .

> We will also check at our end and revert for this.
I do not understand this statement . Revert what ?
Gajjar, Parth Nov. 12, 2024, 1:17 p.m. UTC | #3
Hi Marek,

We tried running glmark2-es2-wayland application with mali and lima driver and didn’t observed any hang. We will also check with kmscube application. 

Attaching logs for clock summary.

Did you try with mali or lima driver?

Regards,
Parth

-----Original Message-----
From: Marek Vasut <marex@denx.de> 
Sent: Monday, November 11, 2024 9:55 PM
To: Sagar, Vishal <vishal.sagar@amd.com>; linux-clk@vger.kernel.org
Cc: Michael Turquette <mturquette@baylibre.com>; Simek, Michal <michal.simek@amd.com>; Stephen Boyd <sboyd@kernel.org>; linux-arm-kernel@lists.infradead.org; Gajjar, Parth <parth.gajjar@amd.com>; Allagadapa, Varunkumar <varunkumar.allagadapa@amd.com>
Subject: Re: [PATCH] clk: zynqmp: Work around broken DT GPU node

On 11/11/24 3:33 PM, Sagar, Vishal wrote:
> Hi Marek,
> 
> Thanks for sharing this patch.
> 
> On 10/31/2024 5:59 PM, Marek Vasut wrote:
>> The ZynqMP DT GPU node clock description is wrong and does not 
>> represent the hardware correctly, it only describes BUS and PP0 
>> clock, while it is missing PP1 clock. That means PP1 clock can never 
>> be enabled when the GPU should be used, which leads to expected GPU 
>> hang even with simple basic tests like kmscube.
> 
> Could you please share how you tested this?

I tested this by running kmscube, see one line above.

> Please share the dt node too.

The GPU DT node is already in arch/arm64/boot/dts/xilinx/zynqmp.dtsi .

> We will also check at our end and revert for this.
I do not understand this statement . Revert what ?
xilinx-zcu106-20242:/home/petalinux#
xilinx-zcu106-20242:/home/petalinux#
xilinx-zcu106-20242:/home/petalinux# cat /sys/kernel/debug/clk/clk_summary | grep gpu
                   gpu_ref_mux       0       0        0        499950000   0          0     50000      Y                     deviceless                      no_connection_id
                      gpu_ref_div1   0       0        0        499950000   0          0     50000      Y                        deviceless                      no_connection_id
                         gpu_ref     0       0        0        499950000   0          0     50000      N                           fd4b0000.gpu                    bus
                            gpu_pp1_ref 0       0        0        499950000   0          0     50000      N                              deviceless                      no_connection_id
                            gpu_pp0_ref 0       0        0        499950000   0          0     50000      N                              fd4b0000.gpu                    core
xilinx-zcu106-20242:/home/petalinux#
xilinx-zcu106-20242:/home/petalinux#
xilinx-zcu106-20242:/home/petalinux#
xilinx-zcu106-20242:/home/petalinux# glmark2-es2-wayland &
[1] 973
xilinx-zcu106-20242:/home/petalinux# =======================================================
    glmark2 2023.01
=======================================================
    OpenGL Information
    GL_VENDOR:      Mesa
    GL_RENDERER:    Mali400
    GL_VERSION:     OpenGL ES 2.0 Mesa 24.0.7
    Surface Config: buf=32 r=8 g=8 b=8 a=8 depth=24 stencil=0 samples=0
    Surface Size:   800x600 windowed
=======================================================
[build] use-vbo=false: FPS: 653 FrameTime: 1.531 ms
[build] use-vbo=true: FPS: 693 FrameTime: 1.444 ms
[texture] texture-filter=nearest: FPS: 715 FrameTime: 1.400 ms
[texture] texture-filter=linear: FPS: 691 FrameTime: 1.448 ms
[texture] texture-filter=mipmap: FPS: 671 FrameTime: 1.492 ms
xilinx-zcu106-20242:/home/petalinux#
xilinx-zcu106-20242:/home/petalinux#
xilinx-zcu106-20242:/home/petalinux#
xilinx-zcu106-20242:/home/petalinux# cat /sys/kernel/debug/clk/clk_summary | grep gpu
                   gpu_ref_mux       1       1        1        499950000   0          0     50000      Y                     deviceless                      no_connection_id
                      gpu_ref_div1   1       1        1        499950000   0          0     50000      Y                        deviceless                      no_connection_id
                         gpu_ref     2       2        2        499950000   0          0     50000      Y                           fd4b0000.gpu                    bus
                            gpu_pp1_ref 0       0        0        499950000   0          0     50000      Y                              deviceless                      no_connection_id
                            gpu_pp0_ref 1       1        0        499950000   0          0     50000      Y                              fd4b0000.gpu                    core
xilinx-zcu106-20242:/home/petalinux#
Marek Vasut Nov. 12, 2024, 8 p.m. UTC | #4
On 11/12/24 2:17 PM, Gajjar, Parth wrote:
> Hi Marek,

Hello everyone,

> We tried running glmark2-es2-wayland application with mali and lima driver and didn’t observed any hang. We will also check with kmscube application.
> 
> Attaching logs for clock summary.
> 
> Did you try with mali or lima driver?

I only use lima driver.

Can you share full boot log of this machine , including the firmware 
blob versions ? Is it maybe possible some newer blob(s) enable both PP0 
and PP1 internally to work around this clocking issue in Linux ?
Gajjar, Parth Nov. 13, 2024, 1:32 p.m. UTC | #5
Hi Marek,

We tried running kmscube application with lima driver and it is working fine.
Attaching application logs and boot logs.

We are using our 6.6 kernel.
Meanwhile we will also check with upstream kernel.

Which kernel version are you using?

Regards,
Parth

-----Original Message-----
From: Marek Vasut <marex@denx.de> 
Sent: Wednesday, November 13, 2024 1:31 AM
To: Gajjar, Parth <parth.gajjar@amd.com>; Sagar, Vishal <vishal.sagar@amd.com>; linux-clk@vger.kernel.org
Cc: Michael Turquette <mturquette@baylibre.com>; Simek, Michal <michal.simek@amd.com>; Stephen Boyd <sboyd@kernel.org>; linux-arm-kernel@lists.infradead.org; Allagadapa, Varunkumar <varunkumar.allagadapa@amd.com>
Subject: Re: [PATCH] clk: zynqmp: Work around broken DT GPU node

On 11/12/24 2:17 PM, Gajjar, Parth wrote:
> Hi Marek,

Hello everyone,

> We tried running glmark2-es2-wayland application with mali and lima driver and didn’t observed any hang. We will also check with kmscube application.
> 
> Attaching logs for clock summary.
> 
> Did you try with mali or lima driver?

I only use lima driver.

Can you share full boot log of this machine , including the firmware blob versions ? Is it maybe possible some newer blob(s) enable both PP0 and PP1 internally to work around this clocking issue in Linux ?

--
Best regards,
Marek Vasut
====================================
Before running application
====================================
xilinx-zcu106-20242:/home/petalinux#
xilinx-zcu106-20242:/home/petalinux#
xilinx-zcu106-20242:/home/petalinux#
xilinx-zcu106-20242:/home/petalinux# cat /sys/kernel/debug/clk/clk_summary | grep gpu
                   gpu_ref_mux       0       0        0        499950000   0          0     50000      Y                     deviceless                      no_connection_id
                      gpu_ref_div1   0       0        0        499950000   0          0     50000      Y                        deviceless                      no_connection_id
                         gpu_ref     0       0        0        499950000   0          0     50000      N                           fd4b0000.gpu                    bus
                            gpu_pp1_ref 0       0        0        499950000   0          0     50000      N                              deviceless                      no_connection_id
                            gpu_pp0_ref 0       0        0        499950000   0          0     50000      N                              fd4b0000.gpu                    core
xilinx-zcu106-20242:/home/petalinux#
xilinx-zcu106-20242:/home/petalinux#
xilinx-zcu106-20242:/home/petalinux#



====================================
While running application
====================================
xilinx-zcu106-20242:/home/petalinux#
xilinx-zcu106-20242:/home/petalinux#
xilinx-zcu106-20242:/home/petalinux# cat /sys/kernel/debug/clk/clk_summary | grep gpu
                   gpu_ref_mux       1       1        1        499950000   0          0     50000      Y                     deviceless                      no_connection_id
                      gpu_ref_div1   1       1        1        499950000   0          0     50000      Y                        deviceless                      no_connection_id
                         gpu_ref     2       2        2        499950000   0          0     50000      Y                           fd4b0000.gpu                    bus
                            gpu_pp1_ref 0       0        0        499950000   0          0     50000      Y                              deviceless                      no_connection_id
                            gpu_pp0_ref 1       1        0        499950000   0          0     50000      Y                              fd4b0000.gpu                    core
xilinx-zcu106-20242:/home/petalinux#
xilinx-zcu106-20242:/home/petalinux#
xilinx-zcu106-20242:/home/petalinux#
xilinx-zcu106-20242:/home/petalinux# cat /proc/interrupts | grep pp0
 56:      53251          0          0          0     GICv2 164 Level     gpmmu, ppmmu0, ppmmu1, gp, pp0, pp1
xilinx-zcu106-20242:/home/petalinux# cat /proc/interrupts | grep pp0
 56:      53413          0          0          0     GICv2 164 Level     gpmmu, ppmmu0, ppmmu1, gp, pp0, pp1
xilinx-zcu106-20242:/home/petalinux# cat /proc/interrupts | grep pp0
 56:      53495          0          0          0     GICv2 164 Level     gpmmu, ppmmu0, ppmmu1, gp, pp0, pp1
xilinx-zcu106-20242:/home/petalinux#




====================================
kmscube logs
====================================
xilinx-zcu106-20242:/home/petalinux#
xilinx-zcu106-20242:/home/petalinux# kmscube
Using display 0xaaaaecf14d50 with EGL version 1.4
===================================
EGL information:
  version: "1.4"
  vendor: "Mesa Project"
  client extensions: "EGL_EXT_client_extensions EGL_EXT_device_base EGL_EXT_device_enumeration EGL_EXT_device_query EGL_EXT_platform_base EGL_KHR_client_get_all_proc_addresses EGL_KHR_debug EGL_EXT_platform_device EGL_EXT_explicit_device EGL_EXT_platform_wayland EGL_KHR_platform_wayland EGL_EXT_platform_x11 EGL_KHR_platform_x11 EGL_EXT_platform_xcb EGL_MESA_platform_gbm EGL_KHR_platform_gbm EGL_MESA_platform_surfaceless"
  display extensions: "EGL_ANDROID_blob_cache EGL_ANDROID_native_fence_sync EGL_EXT_buffer_age EGL_EXT_image_dma_buf_import EGL_EXT_image_dma_buf_import_modifiers EGL_KHR_cl_event2 EGL_KHR_config_attribs EGL_KHR_context_flush_control EGL_KHR_create_context EGL_KHR_create_context_no_error EGL_KHR_fence_sync EGL_KHR_get_all_proc_addresses EGL_KHR_gl_colorspace EGL_KHR_gl_renderbuffer_image EGL_KHR_gl_texture_2D_image EGL_KHR_gl_texture_3D_image EGL_KHR_gl_texture_cubemap_image EGL_KHR_image EGL_KHR_image_base EGL_KHR_image_pixmap EGL_KHR_no_config_context EGL_KHR_partial_update EGL_KHR_reusable_sync EGL_KHR_surfaceless_context EGL_EXT_pixel_format_float EGL_KHR_wait_sync EGL_MESA_configless_context EGL_MESA_drm_image EGL_MESA_gl_interop EGL_MESA_image_dma_buf_export EGL_MESA_query_driver EGL_WL_bind_wayland_display "
===================================
OpenGL ES 2.x information:
  version: "OpenGL ES 2.0 Mesa 24.0.7"
  shading language version: "OpenGL ES GLSL ES 1.0.16"
  vendor: "Mesa"
  renderer: "Mali400"
  extensions: "GL_EXT_blend_minmax GL_EXT_multi_draw_arrays GL_EXT_texture_compression_s3tc GL_EXT_texture_compression_dxt1 GL_EXT_texture_format_BGRA8888 GL_OES_compressed_ETC1_RGB8_texture GL_OES_depth24 GL_OES_element_index_uint GL_OES_fbo_render_mipmap GL_OES_mapbuffer GL_OES_rgb8_rgba8 GL_OES_standard_derivatives GL_OES_stencil8 GL_OES_texture_3D GL_OES_texture_half_float GL_OES_texture_half_float_linear GL_OES_texture_npot GL_OES_vertex_half_float GL_OES_EGL_image GL_OES_depth_texture GL_OES_packed_depth_stencil GL_OES_get_program_binary GL_APPLE_texture_max_level GL_EXT_discard_framebuffer GL_EXT_read_format_bgra GL_NV_pack_subimage GL_NV_texture_barrier GL_EXT_frag_depth GL_NV_fbo_color_attachments GL_OES_EGL_image_external GL_OES_EGL_sync GL_OES_vertex_array_object GL_ANGLE_pack_reverse_row_order GL_ANGLE_texture_compression_dxt3 GL_ANGLE_texture_compression_dxt5 GL_EXT_texture_rg GL_EXT_unpack_subimage GL_NV_draw_buffers GL_NV_read_buffer GL_NV_read_depth GL_NV_read_depth_stencil GL_NV_read_stencil GL_APPLE_sync GL_EXT_draw_buffers GL_EXT_map_buffer_range GL_KHR_debug GL_KHR_texture_compression_astc_ldr GL_NV_generate_mipmap_sRGB GL_NV_pixel_buffer_object GL_OES_required_internalformat GL_OES_surfaceless_context GL_EXT_debug_label GL_EXT_separate_shader_objects GL_EXT_compressed_ETC1_RGB8_sub_texture GL_EXT_draw_elements_base_vertex GL_EXT_texture_border_clamp GL_KHR_context_flush_control GL_OES_draw_elements_base_vertex GL_OES_texture_border_clamp GL_EXT_blend_func_extended GL_KHR_no_error GL_KHR_texture_compression_astc_sliced_3d GL_EXT_multisampled_render_to_texture GL_EXT_multisampled_render_to_texture2 GL_EXT_texture_compression_s3tc_srgb GL_EXT_clip_control GL_KHR_parallel_shader_compile GL_MESA_sampler_objects GL_MESA_bgra "
===================================
Rendered 60 frames in 2.001299 sec (29.980525 fps)
Rendered 120 frames in 4.002624 sec (29.980335 fps)
Rendered 180 frames in 6.003956 sec (29.980231 fps)
Rendered 240 frames in 8.005281 sec (29.980210 fps)
Rendered 300 frames in 10.006600 sec (29.980214 fps)
Rendered 360 frames in 12.007930 sec (29.980189 fps)
Rendered 420 frames in 14.009258 sec (29.980175 fps)
Rendered 480 frames in 16.010584 sec (29.980169 fps)
Rendered 540 frames in 18.011923 sec (29.980142 fps)
Rendered 600 frames in 20.013232 sec (29.980165 fps)
Rendered 660 frames in 22.014557 sec (29.980163 fps)
Rendered 720 frames in 24.015895 sec (29.980145 fps)
Rendered 780 frames in 26.017208 sec (29.980157 fps)
Rendered 840 frames in 28.018530 sec (29.980160 fps)
Rendered 900 frames in 30.019857 sec (29.980157 fps)
Rendered 960 frames in 32.021181 sec (29.980156 fps)
Marek Vasut Nov. 13, 2024, 7:55 p.m. UTC | #6
On 11/13/24 2:32 PM, Gajjar, Parth wrote:
> Hi Marek,

Hi,

> We tried running kmscube application with lima driver and it is working fine.
> Attaching application logs and boot logs.
> 
> We are using our 6.6 kernel.
> Meanwhile we will also check with upstream kernel.

Is this the heavily patched kernel version from
https://github.com/Xilinx/linux-xlnx branch xlnx/xlnx_rebase_v6.6_LTS
with
  865 files changed, 216895 insertions(+), 8276 deletions(-)
or an actual stock 6.6.40 ?

> Is it maybe possible some newer blob(s) enable both PP0 and PP1 internally to work around this clocking issue in Linux ?

The blobs I use are 2019.1 , so what about this question ^ ?
diff mbox series

Patch

diff --git a/drivers/clk/zynqmp/clk-gate-zynqmp.c b/drivers/clk/zynqmp/clk-gate-zynqmp.c
index b89e557371984..b013aa33e7abb 100644
--- a/drivers/clk/zynqmp/clk-gate-zynqmp.c
+++ b/drivers/clk/zynqmp/clk-gate-zynqmp.c
@@ -7,6 +7,7 @@ 
  * Gated clock implementation
  */
 
+#include <dt-bindings/clock/xlnx-zynqmp-clk.h>
 #include <linux/clk-provider.h>
 #include <linux/slab.h>
 #include "clk-zynqmp.h"
@@ -38,7 +39,13 @@  static int zynqmp_clk_gate_enable(struct clk_hw *hw)
 	u32 clk_id = gate->clk_id;
 	int ret;
 
-	ret = zynqmp_pm_clock_enable(clk_id);
+	if (clk_id == GPU_PP0_REF || clk_id == GPU_PP1_REF) {
+		ret = zynqmp_pm_clock_enable(GPU_PP0_REF);
+		if (!ret)
+			ret = zynqmp_pm_clock_enable(GPU_PP1_REF);
+	} else {
+		ret = zynqmp_pm_clock_enable(clk_id);
+	}
 
 	if (ret)
 		pr_debug("%s() clock enable failed for %s (id %d), ret = %d\n",
@@ -58,7 +65,13 @@  static void zynqmp_clk_gate_disable(struct clk_hw *hw)
 	u32 clk_id = gate->clk_id;
 	int ret;
 
-	ret = zynqmp_pm_clock_disable(clk_id);
+	if (clk_id == GPU_PP0_REF || clk_id == GPU_PP1_REF) {
+		ret = zynqmp_pm_clock_disable(GPU_PP1_REF);
+		if (!ret)
+			ret = zynqmp_pm_clock_disable(GPU_PP0_REF);
+	} else {
+		ret = zynqmp_pm_clock_disable(clk_id);
+	}
 
 	if (ret)
 		pr_debug("%s() clock disable failed for %s (id %d), ret = %d\n",