[v2] soc: qcom: Rework BCM_TCS_CMD macro

Message ID	20241028163403.522001-1-eugen.hristev@linaro.org (mailing list archive)
State	Superseded
Headers	show Received: from mail-wr1-f41.google.com (mail-wr1-f41.google.com [209.85.221.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EFE8A1DED67 for <linux-arm-msm@vger.kernel.org>; Mon, 28 Oct 2024 16:34:25 +0000 (UTC) From: Eugen Hristev <eugen.hristev@linaro.org> To: linux-arm-msm@vger.kernel.org Cc: andersson@kernel.org, konradybcio@kernel.org, linux-kernel@vger.kernel.org, linux-clk@vger.kernel.org, linux-pm@vger.kernel.org, djakov@kernel.org, sboyd@kernel.org, mturquette@baylibre.com, Eugen Hristev <eugen.hristev@linaro.org> Subject: [PATCH v2] soc: qcom: Rework BCM_TCS_CMD macro Date: Mon, 28 Oct 2024 18:34:03 +0200 Message-ID: <20241028163403.522001-1-eugen.hristev@linaro.org> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	[v2] soc: qcom: Rework BCM_TCS_CMD macro \| expand [v2] soc: qcom: Rework BCM_TCS_CMD macro

Eugen Hristev Oct. 28, 2024, 4:34 p.m. UTC

Reworked BCM_TCS_CMD macro in order to fix warnings from sparse:

drivers/clk/qcom/clk-rpmh.c:270:28: warning: restricted __le32 degrades to integer
drivers/clk/qcom/clk-rpmh.c:270:28: warning: restricted __le32 degrades to integer

While at it, used le32_encode_bits which made the code easier to
follow and removed unnecessary shift definitions.

Signed-off-by: Eugen Hristev <eugen.hristev@linaro.org>
---
 drivers/clk/qcom/clk-rpmh.c           |  2 +-
 drivers/interconnect/qcom/bcm-voter.c |  2 +-
 include/soc/qcom/tcs.h                | 28 +++++++++++++--------------
 3 files changed, 16 insertions(+), 16 deletions(-)

Stephen Boyd Oct. 28, 2024, 5:56 p.m. UTC | #1

Quoting Eugen Hristev (2024-10-28 09:34:03)
> diff --git a/include/soc/qcom/tcs.h b/include/soc/qcom/tcs.h
> index 3acca067c72b..152947a922c0 100644
> --- a/include/soc/qcom/tcs.h
> +++ b/include/soc/qcom/tcs.h
> @@ -60,22 +63,19 @@ struct tcs_request {
>         struct tcs_cmd *cmds;
>  };
>  
> -#define BCM_TCS_CMD_COMMIT_SHFT                30
> -#define BCM_TCS_CMD_COMMIT_MASK                0x40000000
> -#define BCM_TCS_CMD_VALID_SHFT         29
> -#define BCM_TCS_CMD_VALID_MASK         0x20000000
> -#define BCM_TCS_CMD_VOTE_X_SHFT                14
> -#define BCM_TCS_CMD_VOTE_MASK          0x3fff
> -#define BCM_TCS_CMD_VOTE_Y_SHFT                0
> -#define BCM_TCS_CMD_VOTE_Y_MASK                0xfffc000
> +#define BCM_TCS_CMD_COMMIT_MASK                BIT(30)
> +#define BCM_TCS_CMD_VALID_MASK         BIT(29)
> +#define BCM_TCS_CMD_VOTE_MASK          GENMASK(13, 0)
> +#define BCM_TCS_CMD_VOTE_Y_MASK                GENMASK(13, 0)
> +#define BCM_TCS_CMD_VOTE_X_MASK                GENMASK(27, 14)
>  
>  /* Construct a Bus Clock Manager (BCM) specific TCS command */
>  #define BCM_TCS_CMD(commit, valid, vote_x, vote_y)             \
> -       (((commit) << BCM_TCS_CMD_COMMIT_SHFT) |                \
> -       ((valid) << BCM_TCS_CMD_VALID_SHFT) |                   \
> -       ((cpu_to_le32(vote_x) &                                 \
> -       BCM_TCS_CMD_VOTE_MASK) << BCM_TCS_CMD_VOTE_X_SHFT) |    \
> -       ((cpu_to_le32(vote_y) &                                 \
> -       BCM_TCS_CMD_VOTE_MASK) << BCM_TCS_CMD_VOTE_Y_SHFT))
> +       (le32_encode_bits(commit, BCM_TCS_CMD_COMMIT_MASK) |    \
> +       le32_encode_bits(valid, BCM_TCS_CMD_VALID_MASK) |       \
> +       le32_encode_bits(vote_x,        \
> +                       BCM_TCS_CMD_VOTE_X_MASK) |              \
> +       le32_encode_bits(vote_y,        \
> +                       BCM_TCS_CMD_VOTE_Y_MASK))

Why is cpu_to_le32() inside BCM_TCS_CMD at all? Is struct tcs_cmd::data
supposed to be marked as __le32?

Can the whole u32 be constructed and turned into an __le32 after setting
all the bit fields instead of using le32_encode_bits() multiple times?

Eugen Hristev Oct. 29, 2024, 1:12 p.m. UTC | #2

On 10/28/24 19:56, Stephen Boyd wrote:
> Quoting Eugen Hristev (2024-10-28 09:34:03)
>> diff --git a/include/soc/qcom/tcs.h b/include/soc/qcom/tcs.h
>> index 3acca067c72b..152947a922c0 100644
>> --- a/include/soc/qcom/tcs.h
>> +++ b/include/soc/qcom/tcs.h
>> @@ -60,22 +63,19 @@ struct tcs_request {
>>          struct tcs_cmd *cmds;
>>   };
>>   
>> -#define BCM_TCS_CMD_COMMIT_SHFT                30
>> -#define BCM_TCS_CMD_COMMIT_MASK                0x40000000
>> -#define BCM_TCS_CMD_VALID_SHFT         29
>> -#define BCM_TCS_CMD_VALID_MASK         0x20000000
>> -#define BCM_TCS_CMD_VOTE_X_SHFT                14
>> -#define BCM_TCS_CMD_VOTE_MASK          0x3fff
>> -#define BCM_TCS_CMD_VOTE_Y_SHFT                0
>> -#define BCM_TCS_CMD_VOTE_Y_MASK                0xfffc000
>> +#define BCM_TCS_CMD_COMMIT_MASK                BIT(30)
>> +#define BCM_TCS_CMD_VALID_MASK         BIT(29)
>> +#define BCM_TCS_CMD_VOTE_MASK          GENMASK(13, 0)
>> +#define BCM_TCS_CMD_VOTE_Y_MASK                GENMASK(13, 0)
>> +#define BCM_TCS_CMD_VOTE_X_MASK                GENMASK(27, 14)
>>   
>>   /* Construct a Bus Clock Manager (BCM) specific TCS command */
>>   #define BCM_TCS_CMD(commit, valid, vote_x, vote_y)             \
>> -       (((commit) << BCM_TCS_CMD_COMMIT_SHFT) |                \
>> -       ((valid) << BCM_TCS_CMD_VALID_SHFT) |                   \
>> -       ((cpu_to_le32(vote_x) &                                 \
>> -       BCM_TCS_CMD_VOTE_MASK) << BCM_TCS_CMD_VOTE_X_SHFT) |    \
>> -       ((cpu_to_le32(vote_y) &                                 \
>> -       BCM_TCS_CMD_VOTE_MASK) << BCM_TCS_CMD_VOTE_Y_SHFT))
>> +       (le32_encode_bits(commit, BCM_TCS_CMD_COMMIT_MASK) |    \
>> +       le32_encode_bits(valid, BCM_TCS_CMD_VALID_MASK) |       \
>> +       le32_encode_bits(vote_x,        \
>> +                       BCM_TCS_CMD_VOTE_X_MASK) |              \
>> +       le32_encode_bits(vote_y,        \
>> +                       BCM_TCS_CMD_VOTE_Y_MASK))
> 
> Why is cpu_to_le32() inside BCM_TCS_CMD at all? Is struct tcs_cmd::data
> supposed to be marked as __le32?
> 
> Can the whole u32 be constructed and turned into an __le32 after setting
> all the bit fields instead of using le32_encode_bits() multiple times?

I believe no. The fields inside the constructed TCS command should be 
little endian. If we construct the whole u32 and then convert it from 
cpu endinaness to little endian, this might prove to be incorrect as it 
would swap the bytes at the u32 level, while originally, the bytes for 
each field that was longer than 1 byte were swapped before being added 
to the constructed u32.
So I would say that the fields inside the constructed item are indeed 
le32, but the result as a whole is an u32 which would be sent to the 
hardware using an u32 container , and no byte swapping should be done 
there, as the masks already place the fields at the required offsets.
So the tcs_cmd.data is not really a le32, at least my acception of it.
Does this make sense ?

Eugen

Stephen Boyd Oct. 30, 2024, 12:40 a.m. UTC | #3

Quoting Eugen Hristev (2024-10-29 06:12:12)
> On 10/28/24 19:56, Stephen Boyd wrote:
> > Quoting Eugen Hristev (2024-10-28 09:34:03)
> >> diff --git a/include/soc/qcom/tcs.h b/include/soc/qcom/tcs.h
> >> index 3acca067c72b..152947a922c0 100644
> >> --- a/include/soc/qcom/tcs.h
> >> +++ b/include/soc/qcom/tcs.h
[....]
> >>   /* Construct a Bus Clock Manager (BCM) specific TCS command */
> >>   #define BCM_TCS_CMD(commit, valid, vote_x, vote_y)             \
> >> -       (((commit) << BCM_TCS_CMD_COMMIT_SHFT) |                \
> >> -       ((valid) << BCM_TCS_CMD_VALID_SHFT) |                   \
> >> -       ((cpu_to_le32(vote_x) &                                 \
> >> -       BCM_TCS_CMD_VOTE_MASK) << BCM_TCS_CMD_VOTE_X_SHFT) |    \
> >> -       ((cpu_to_le32(vote_y) &                                 \
> >> -       BCM_TCS_CMD_VOTE_MASK) << BCM_TCS_CMD_VOTE_Y_SHFT))
> >> +       (le32_encode_bits(commit, BCM_TCS_CMD_COMMIT_MASK) |    \
> >> +       le32_encode_bits(valid, BCM_TCS_CMD_VALID_MASK) |       \
> >> +       le32_encode_bits(vote_x,        \
> >> +                       BCM_TCS_CMD_VOTE_X_MASK) |              \
> >> +       le32_encode_bits(vote_y,        \
> >> +                       BCM_TCS_CMD_VOTE_Y_MASK))
> > 
> > Why is cpu_to_le32() inside BCM_TCS_CMD at all? Is struct tcs_cmd::data
> > supposed to be marked as __le32?
> > 
> > Can the whole u32 be constructed and turned into an __le32 after setting
> > all the bit fields instead of using le32_encode_bits() multiple times?
> 
> I believe no. The fields inside the constructed TCS command should be 
> little endian. If we construct the whole u32 and then convert it from 
> cpu endinaness to little endian, this might prove to be incorrect as it 
> would swap the bytes at the u32 level, while originally, the bytes for 
> each field that was longer than 1 byte were swapped before being added 
> to the constructed u32.
> So I would say that the fields inside the constructed item are indeed 
> le32, but the result as a whole is an u32 which would be sent to the 
> hardware using an u32 container , and no byte swapping should be done 
> there, as the masks already place the fields at the required offsets.
> So the tcs_cmd.data is not really a le32, at least my acception of it.
> Does this make sense ?
> 

Sort of? But I thought that the RPMh hardware was basically 32-bit
little-endian registers. That's why write_tcs_*() APIs in
drivers/soc/qcom/rpmh-rsc.c use writel() and readl(), right? The
cpu_to_le32() code that's there today is doing nothing, because the CPU
is little-endian 99% of the time. It's likely doing the wrong thing on
big-endian machines. Looking at commit 6311b6521bcc ("drivers: qcom: Add
BCM vote macro to header") it seems to have picked the macro version
from interconnect vs. clk subsystem. And commit b5d2f741077a
("interconnect: qcom: Add sdm845 interconnect provider driver") used
cpu_to_le32() but I can't figure out why.

If the rpmh-rsc code didn't use writel() or readl() I'd believe that the
data member is simply a u32 container. But those writel() and readl()
functions are doing a byte swap, which seems to imply that the data
member is a native CPU endian u32 that needs to be converted to
little-endian. Sounds like BCM_TCS_CMD() should just pack things into a
u32 and we can simply remove the cpu_to_l32() stuff in the macro?

Eugen Hristev Oct. 30, 2024, 8:28 a.m. UTC | #4

On 10/30/24 02:40, Stephen Boyd wrote:
> Quoting Eugen Hristev (2024-10-29 06:12:12)
>> On 10/28/24 19:56, Stephen Boyd wrote:
>>> Quoting Eugen Hristev (2024-10-28 09:34:03)
>>>> diff --git a/include/soc/qcom/tcs.h b/include/soc/qcom/tcs.h
>>>> index 3acca067c72b..152947a922c0 100644
>>>> --- a/include/soc/qcom/tcs.h
>>>> +++ b/include/soc/qcom/tcs.h
> [....]
>>>>    /* Construct a Bus Clock Manager (BCM) specific TCS command */
>>>>    #define BCM_TCS_CMD(commit, valid, vote_x, vote_y)             \
>>>> -       (((commit) << BCM_TCS_CMD_COMMIT_SHFT) |                \
>>>> -       ((valid) << BCM_TCS_CMD_VALID_SHFT) |                   \
>>>> -       ((cpu_to_le32(vote_x) &                                 \
>>>> -       BCM_TCS_CMD_VOTE_MASK) << BCM_TCS_CMD_VOTE_X_SHFT) |    \
>>>> -       ((cpu_to_le32(vote_y) &                                 \
>>>> -       BCM_TCS_CMD_VOTE_MASK) << BCM_TCS_CMD_VOTE_Y_SHFT))
>>>> +       (le32_encode_bits(commit, BCM_TCS_CMD_COMMIT_MASK) |    \
>>>> +       le32_encode_bits(valid, BCM_TCS_CMD_VALID_MASK) |       \
>>>> +       le32_encode_bits(vote_x,        \
>>>> +                       BCM_TCS_CMD_VOTE_X_MASK) |              \
>>>> +       le32_encode_bits(vote_y,        \
>>>> +                       BCM_TCS_CMD_VOTE_Y_MASK))
>>>
>>> Why is cpu_to_le32() inside BCM_TCS_CMD at all? Is struct tcs_cmd::data
>>> supposed to be marked as __le32?
>>>
>>> Can the whole u32 be constructed and turned into an __le32 after setting
>>> all the bit fields instead of using le32_encode_bits() multiple times?
>>
>> I believe no. The fields inside the constructed TCS command should be
>> little endian. If we construct the whole u32 and then convert it from
>> cpu endinaness to little endian, this might prove to be incorrect as it
>> would swap the bytes at the u32 level, while originally, the bytes for
>> each field that was longer than 1 byte were swapped before being added
>> to the constructed u32.
>> So I would say that the fields inside the constructed item are indeed
>> le32, but the result as a whole is an u32 which would be sent to the
>> hardware using an u32 container , and no byte swapping should be done
>> there, as the masks already place the fields at the required offsets.
>> So the tcs_cmd.data is not really a le32, at least my acception of it.
>> Does this make sense ?
>>
> 
> Sort of? But I thought that the RPMh hardware was basically 32-bit
> little-endian registers. That's why write_tcs_*() APIs in
> drivers/soc/qcom/rpmh-rsc.c use writel() and readl(), right? The
> cpu_to_le32() code that's there today is doing nothing, because the CPU
> is little-endian 99% of the time. It's likely doing the wrong thing on
> big-endian machines. Looking at commit 6311b6521bcc ("drivers: qcom: Add
> BCM vote macro to header") it seems to have picked the macro version
> from interconnect vs. clk subsystem. And commit b5d2f741077a
> ("interconnect: qcom: Add sdm845 interconnect provider driver") used
> cpu_to_le32() but I can't figure out why.
> 
> If the rpmh-rsc code didn't use writel() or readl() I'd believe that the
> data member is simply a u32 container. But those writel() and readl()
> functions are doing a byte swap, which seems to imply that the data
> member is a native CPU endian u32 that needs to be converted to
> little-endian. Sounds like BCM_TCS_CMD() should just pack things into a
> u32 and we can simply remove the cpu_to_l32() stuff in the macro?

This review [1] from Evan Green on the original patch submission 
requested the use of cpu_to_le32

So that's how it ended up there.


[1] 
https://lore.kernel.org/linux-kernel//20180806225252.GQ30024@minitux/T/#mab6b799b3f9b51725c804a65f3580ef8894205f2

Stephen Boyd Nov. 8, 2024, 7 p.m. UTC | #5

Quoting Eugen Hristev (2024-10-30 01:28:14)
> On 10/30/24 02:40, Stephen Boyd wrote:
> > 
> > If the rpmh-rsc code didn't use writel() or readl() I'd believe that the
> > data member is simply a u32 container. But those writel() and readl()
> > functions are doing a byte swap, which seems to imply that the data
> > member is a native CPU endian u32 that needs to be converted to
> > little-endian. Sounds like BCM_TCS_CMD() should just pack things into a
> > u32 and we can simply remove the cpu_to_l32() stuff in the macro?
> 
> This review [1] from Evan Green on the original patch submission 
> requested the use of cpu_to_le32
> 
> So that's how it ended up there.
> 

Thanks. I still don't see why this can't just be treated as a u32 and
then we have writel() take care of it for us.

Eugen Hristev Nov. 11, 2024, 1:05 p.m. UTC | #6

On 11/8/24 21:00, Stephen Boyd wrote:
> Quoting Eugen Hristev (2024-10-30 01:28:14)
>> On 10/30/24 02:40, Stephen Boyd wrote:
>>> 
>>> If the rpmh-rsc code didn't use writel() or readl() I'd believe
>>> that the data member is simply a u32 container. But those
>>> writel() and readl() functions are doing a byte swap, which
>>> seems to imply that the data member is a native CPU endian u32
>>> that needs to be converted to little-endian. Sounds like
>>> BCM_TCS_CMD() should just pack things into a u32 and we can
>>> simply remove the cpu_to_l32() stuff in the macro?
>> 
>> This review [1] from Evan Green on the original patch submission 
>> requested the use of cpu_to_le32
>> 
>> So that's how it ended up there.
>> 
> 
> Thanks. I still don't see why this can't just be treated as a u32
> and then we have writel() take care of it for us.

If the values are in the wrong endianness, e.g. 0xff11 instead of 
0x11ff, the corresponding field would be filled up wrongly, even 
possibly writing unwanted bits. vote_x and vote_y have a mask of length 
14, so there is one byte and another 6 more bits. If the endianness of 
the value is not correct, the one byte might end up written over the 6 
bits and 2 extra bits which are supposed to be for another field.
In my example 0x11 should be in the first 6 bits and the 0xff in the 
next byte, but if the endianness of the cpu is different, we might write 
0xff on the 6 bit field.
So we must ensure that the multi-byte fields are in the correct 
endianness that the hardware expects.

In other words, writel does not know about the multi-byte fields inside 
this u32 which have a specific bit shift, and those fields are expected 
to be in le32 order written to the hardware. Whether or not the cpu is 
le32 is not important because using cpu_to_le32 will make it safe either 
way.

I apologize for my not so great explanation

Stephen Boyd Nov. 19, 2024, 11:32 p.m. UTC | #7

Quoting Eugen Hristev (2024-11-11 05:05:02)
> 
> 
> On 11/8/24 21:00, Stephen Boyd wrote:
> > Quoting Eugen Hristev (2024-10-30 01:28:14)
> >> On 10/30/24 02:40, Stephen Boyd wrote:
> >>> 
> >>> If the rpmh-rsc code didn't use writel() or readl() I'd believe
> >>> that the data member is simply a u32 container. But those
> >>> writel() and readl() functions are doing a byte swap, which
> >>> seems to imply that the data member is a native CPU endian u32
> >>> that needs to be converted to little-endian. Sounds like
> >>> BCM_TCS_CMD() should just pack things into a u32 and we can
> >>> simply remove the cpu_to_l32() stuff in the macro?
> >> 
> >> This review [1] from Evan Green on the original patch submission 
> >> requested the use of cpu_to_le32
> >> 
> >> So that's how it ended up there.
> >> 
> > 
> > Thanks. I still don't see why this can't just be treated as a u32
> > and then we have writel() take care of it for us.
> 
> If the values are in the wrong endianness, e.g. 0xff11 instead of 
> 0x11ff, the corresponding field would be filled up wrongly, even 
> possibly writing unwanted bits. vote_x and vote_y have a mask of length 
> 14, so there is one byte and another 6 more bits. If the endianness of 
> the value is not correct, the one byte might end up written over the 6 
> bits and 2 extra bits which are supposed to be for another field.
> In my example 0x11 should be in the first 6 bits and the 0xff in the 
> next byte, but if the endianness of the cpu is different, we might write 
> 0xff on the 6 bit field.
> So we must ensure that the multi-byte fields are in the correct 
> endianness that the hardware expects.

The vote_x field looks like it goes from bits 14 to 27. No matter the
endian format of the _word_, bits 14 to 27 are for the vote_x field. So
if I set bit 15 and 17 in a big-endian format word (vote_x is decimal
10), pass that 32-bit word to writel(), swap the bytes, it is still set
as bit 15 and 17 in little-endian format. That's because when I read the
32-bit word as a little-endian machine, the first byte is for bits 0 to
7, second byte is for bits 8 to 15, third byte is for 16 to 23, and
fourth byte is for bits 24 to 31. The hardware assembles the 32-bit word
out of that byte order, knowing that bits are mapped that way. Once I
have the 32-bit word, it can be shifted right 14 times so that the
vote_x field is from 0 to 13, and bits 1 and 3 are set, i.e. decimal 10.

Fields that span bytes don't matter here. The hardware is going to read
the word in the format that it is in, which is the order of bytes, and
assemble a full 32-bit word out of it. We're just setting bits in the
field that's shifted so many bits because it's part of a word. The order
of the bits isn't changing.

> 
> In other words, writel does not know about the multi-byte fields inside 
> this u32 which have a specific bit shift, and those fields are expected 
> to be in le32 order written to the hardware. Whether or not the cpu is 
> le32 is not important because using cpu_to_le32 will make it safe either 
> way.
> 
> I apologize for my not so great explanation
> 

I hope my explanation has helped. Long story short, the cpu_to_le32()
usage here is wrong. Typically we try to operate with a type that's the
same size and native format for as long as possible (u32), partially for
performance reasons but also to make it easier to understand. When it
comes time to write that value to the hardware, write it in the hardware
format (writel), and read from the hardware in the hardware format
(readl). Doing this lets you avoid thinking about the endianness almost
entirely, and makes it so that code doesn't have to be rewritten when
running on a different endian CPU to avoid suffering performance
penalties with all the byte swapping.

[v2] soc: qcom: Rework BCM_TCS_CMD macro

Commit Message

Comments

Patch